background-shape
Loki for Logs, Self-Hosted Aggregation
September 16, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — Loki indexes log labels (not content). 10× cheaper storage than Elasticsearch. LogQL feels like PromQL. Self-host as one Docker container for small setups. Doesn’t replace log analytics platforms; replaces grep-on-files perfectly.

After Alertmanager, the logs pillar. Loki is Grafana’s logs solution. Different shape from Elasticsearch — sometimes a better fit, sometimes not.

How Loki differs from ELK

Elasticsearch indexes everything. Every word in every log line is searchable. Powerful; expensive in storage + RAM.

Loki indexes only labels. {app="api", level="error"} is the searchable key. The log line content is stored compressed but not indexed. Queries grep through compressed chunks.

The trade-off:

  • ELK answers “find all logs containing user_id=42” instantly
  • Loki answers “show me error-level logs from the api app, last hour, containing user_id=42” — fast by label, then grep the matching chunks

For most operational use, the second pattern is what you actually do. Loki wins on cost.

Storage model

Logs are stored as compressed chunks (gzip / snappy). One chunk per stream (unique label set) per time period (default 1 hour).

Index is small — just label combinations → chunk locations. The actual data sits in object storage (S3, GCS, MinIO) or local filesystem.

For our factory project: ~5 GB/day of logs, stored at ~600 MB/day after compression. Loki running on $20/month VM handles it.

Compose setup

services:
  loki:
    image: grafana/loki:2.6.1
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    ports: ["3100:3100"]

  promtail:
    image: grafana/promtail:2.6.1
    volumes:
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
      - ./promtail-config.yaml:/etc/promtail/config.yml
    command: -config.file=/etc/promtail/config.yml

Basic loki-config.yaml:

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  chunk_idle_period: 1h
  max_chunk_age: 1h

schema_config:
  configs:
    - from: 2022-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: filesystem
  filesystem:
    directory: /loki/chunks

limits_config:
  retention_period: 720h    # 30 days

This config is single-instance. For HA, separate ingesters / queriers / index gateways. Most teams don’t need that.

Promtail config

Promtail tails files (or container logs) and ships to Loki:

server:
  http_listen_port: 9080

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: containers
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'

  - job_name: system
    static_configs:
      - targets: [localhost]
        labels:
          job: varlogs
          __path__: /var/log/*.log

Promtail discovers Docker containers, attaches container and stream labels, ships logs.

LogQL

LogQL has two parts: log queries (return log lines) and metric queries (compute over logs).

Log queries:

{container="api"}                           # all logs from container "api"
{container="api"} |= "error"                # logs containing "error"
{container="api"} != "healthcheck"          # logs NOT containing "healthcheck"
{container="api"} |~ "user_id=\\d+"         # regex match
{container="api"} |= "error" | json         # parse JSON, expose fields
{container="api"} | json | level="error"    # filter on parsed field

|=, !=, |~, !~ are line filters. | json, | logfmt parse structured logs.

Metric queries:

rate({container="api"} |= "error"[5m])                          # errors/sec
sum by (path) (count_over_time({container="api"} | json | __error__="" [5m]))
                                                                 # parsed log volume

Behaves like PromQL but operates on log streams.

Cardinality matters here too

Loki indexes labels. High-cardinality labels → many small streams → many chunks → expensive.

Bad labels:

  • user_id — millions of users
  • request_id — unique per request
  • pod_name from a deployment that scales (transient names)

Good labels:

  • container / service (bounded by deployment count)
  • level (info, warn, error, debug)
  • env (prod, staging)
  • cluster (bounded by infra)

Keep total active streams under ~10K for small Loki. ~100K for distributed.

For per-request data, put it in the log line and grep with |= or extract with | json. Don’t make it a label.

Querying in Grafana

Configure Loki as data source. Then in Explore mode or dashboards:

{container="api", level="error"} | json | line_format "{{.timestamp}} {{.message}}"

Returns matching log lines with formatted output.

For dashboards, log panels show recent matches. Combine with metrics:

  • Panel 1: error rate (Prometheus)
  • Panel 2: recent error logs (Loki) for the same time range

Click an error spike → see the actual errors below.

Retention

limits_config:
  retention_period: 720h    # 30 days

Plus a compactor:

compactor:
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h

Logs older than 30 days are deleted automatically. Tune to your compliance + cost needs.

For very long retention at low cost: S3 with Glacier tiering. Loki supports it.

Common Pitfalls

High-cardinality labels. Stream count explodes; query performance dies. Strict label discipline.

Indexing every word. Wrong tool. Use ELK if you need full-text search.

No retention. Disk fills.

Single Loki for huge volume. Past ~50 GB/day, switch to distributed mode (separate ingester/querier/store).

Forgetting | json. Structured logs without parsing = grep through raw JSON strings. Slow.

Promtail mounted at / reading everything. Specific paths only.

Wrapping Up

Loki = label-indexed logs + LogQL + Grafana. Cheap, fast at the operational level. Monday: Promtail pipelines for log parsing.