Prometheus Cardinality and Cost Control

September 28, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — Prometheus cost scales with series count, not data volume. High-cardinality labels (request_id, user_id) blow up cost. Use tsdb-stats to find offenders; drop them in scrape config or instrumentation. Shorter retention + recording rules + storage tier choice control the rest.

After error budgets, the operational cost concern. Prometheus is cheap until it isn’t. Most “Prometheus is expensive” stories trace back to cardinality.

The cardinality math

Each unique combination of labels = one time series. Each series:

Stores samples (timestamps + values) at 16 bytes each
Has overhead in the index (~10-50 bytes per series for label lookup)
Costs CPU during scrape, query, compaction

For a metric with 4 labels of cardinality [10, 5, 100, 200]:

Series = 10 × 5 × 100 × 200 = 1,000,000 series

At 15s scrape interval × 30 days × 16 bytes/sample × 1M series = ~80 GB.

One bad metric can dwarf your whole monitoring footprint.

Finding the offenders

The web UI exposes a stats page:

http://prometheus:9090/tsdb-status

Shows top series by metric name, top series by label, total series count.

For more detail:

# Top metrics by series count
topk(10, count by (__name__)({__name__=~".+"}))

# Series count per label value for a specific metric
count by (path) (http_requests_total)

Or via the API:

curl -s http://prometheus:9090/api/v1/status/tsdb | jq .data

Identifies which metrics are expensive.

Common cardinality bombs

Looking at field experience, the worst offenders:

User ID / customer ID in labels. Bounded by user count = millions. Don’t.

Request ID in labels. Unique per request = infinite.

Full URL path. /users/42, /users/43 — separate series each.

Free-text labels. Error messages, user input echoes — unbounded.

Pod names from auto-scaling deployments. Pod names change on each rollout; old series persist until expiry.

Status code as label on an endpoint that returns many. If your API returns 100 distinct error codes, that’s 100× multiplier.

Fix each: either drop the label or normalize it.

Dropping at scrape

In prometheus.yml:

scrape_configs:
  - job_name: 'api'
    static_configs:
      - targets: ['api:8080']
    metric_relabel_configs:
      # Drop specific metrics entirely
      - source_labels: [__name__]
        regex: 'http_request_size_bytes_bucket'
        action: drop

      # Drop high-card label from kept metrics
      - source_labels: [user_id]
        regex: '.+'
        action: labeldrop

      # Normalize path labels (collapse /users/:id back to a generic)
      - source_labels: [path]
        regex: '/users/\d+'
        target_label: path
        replacement: '/users/:id'

Prometheus applies these after scrape, before storage. Bad labels gone.

Fixing at instrumentation

Better: don’t emit the high-card metric in the first place.

For Go, where the label values come from:

// Bad
httpRequests.WithLabelValues(r.Method, r.URL.Path, fmt.Sprint(status)).Inc()

// Good — chi route pattern
httpRequests.WithLabelValues(r.Method, chi.RouteContext(r.Context()).RoutePattern(), fmt.Sprint(status)).Inc()

This eliminates the problem at the source. Catch it in code review.

Retention math

Default retention is 15 days. Each day adds storage. Shorter retention = lower cost.

For SLO-critical metrics, you might want 60+ days for trend analysis. For verbose service metrics, 7 days is enough — beyond that, recording rules summarize.

prometheus --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=100GB

The size limit truncates old data even if within retention time. Disk fills more predictably.

For long-term: Thanos / Mimir / VictoriaMetrics. Same Prometheus API, separate long-term storage in object storage (S3). Configure Prometheus to ship blocks; query via the long-term layer.

Recording rules for aggregations

Recording rules compute series and store them. The math:

A query like sum by (path) (rate(http_requests_total[5m])) evaluated against 1M series is expensive every dashboard load. A recording rule computes it every 30s, storing the result as a tiny series (api:request_rate_5m{path=...}).

Trade-offs:

Storage: recording rule series stored long-term
CPU: computed every interval instead of per-query
Cardinality: typically lower (aggregated)

For commonly-queried aggregations, recording rules pay off. For one-off queries, raw data is fine.

Series limits (defense in depth)

Prometheus 2.32+ supports series limits per label:

scrape_configs:
  - job_name: 'api'
    sample_limit: 10000
    label_limit: 30
    label_name_length_limit: 128
    label_value_length_limit: 256

If a single scrape returns more than 10000 series, the whole scrape is rejected. Catches sudden cardinality explosions in code.

A real example

Service emitting a metric per user_id “for debugging.” Hidden in a code path used rarely.

http_user_login_attempts_total{user_id="42",result="success"}
http_user_login_attempts_total{user_id="43",result="success"}
...

10M users → 10M series → Prometheus runs out of memory and crashes.

The diagnosis: topk(10, count by (__name__)(...)) shows this metric at the top.

Fix:

// Before
loginAttempts.WithLabelValues(userID, result).Inc()

// After — drop user_id from metric; log separately
loginAttempts.WithLabelValues(result).Inc()
log.Info("login attempt", "user_id", userID, "result", result)

Series drops from 10M to 5 (one per result). Prometheus happy.

Common Pitfalls

No cardinality monitoring. Discover the problem after Prometheus OOMs.

Trying to fix high-cardinality with relabeling alone. Drops the label but Prometheus already scraped. Better: drop at source.

Recording rule for everything. Recording rules cost storage too. Use for common dashboard queries; not for every aggregation.

Long retention for verbose metrics. Old debug metrics nobody queries. Shorter retention OR drop them.

Single Prometheus past 100K active series. Memory pressure starts. Past 1M, you need scaling (sharding, Mimir).

No alert on series count. Alert if total series grows > 20% in 24h. Catches new cardinality bombs early.

Wrapping Up

Cardinality is the lever. Find offenders, fix at source, use recording rules for aggregations, shorter retention for verbose metrics. Friday: September retro.

The cardinality math

Finding the offenders

Common cardinality bombs

Dropping at scrape

Fixing at instrumentation

Retention math

Recording rules for aggregations

Series limits (defense in depth)

A real example

Common Pitfalls

Wrapping Up

Related posts

September Retro, One Stack to Watch Them All

Alerting with Prometheus Alertmanager

Instrumenting Node.js Services for Prometheus

Instrumenting Go Services for Prometheus

Prometheus 101, Metrics, Scraping, and PromQL

Building an Observability Stack in 2022

Monitoring n8n in Production

Rust Service Observability in 2024, Metrics, Logs, and Traces That Help

Let’s Start a Project