Prometheus 101, Metrics, Scraping, and PromQL

Prometheus article cover illustration on a gradient background

September 5, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — Prometheus scrapes HTTP /metrics endpoints at intervals. Four metric types: Counter, Gauge, Histogram, Summary. Labels make metrics multi-dimensional. PromQL queries: rate(), histogram_quantile(), sum by (label). Memorize ~10 PromQL patterns; cover 90% of use cases.

After the observability stack , Prometheus first. The metrics layer is the most operationally load-bearing piece; get it right and the rest follows.

How Prometheus works

Prometheus is pull-based. It connects to your services every N seconds and asks “what are your current metrics?” Services expose a /metrics endpoint with their current state. Prometheus stores; you query.

[Your service] /metrics → 200 OK
http_requests_total{method="GET",status="200"} 1247
http_request_duration_seconds_bucket{le="0.1"} 1100
go_goroutines 47

Prometheus pulls this every 15s. Computes rates, percentiles, etc. via PromQL.

Four metric types

Counter — only goes up (or resets to 0). Track cumulative events: requests handled, errors, bytes sent. Always derive a rate (rate()) before plotting.

Gauge — goes up and down. Track current state: memory in use, queue depth, active connections, temperature.

Histogram — observations bucketed by value. Track durations, sizes. Produces _bucket{le=...}, _sum, _count series. Calculate percentiles with histogram_quantile().

Summary — like histogram but quantiles calculated client-side. Cheaper queries; can’t aggregate across instances. Use histogram in most cases.

Each metric has a name and any number of labels (key=value pairs).

Labels — the multi-dimensional bit

A metric with labels:

http_requests_total{method="GET", path="/users", status="200"} 1247
http_requests_total{method="GET", path="/users", status="500"} 3
http_requests_total{method="POST", path="/users", status="201"} 89

Three time series; one metric name. Queries filter by label, group by label, aggregate.

Two rules:

Labels should have low cardinality. Don’t put user IDs in labels (millions of users = millions of series = Prometheus blows up).
Labels should be enumerable. Status codes (~10 values), HTTP methods (~5), paths (~50): fine. Free-text or unbounded values: bad.

Cardinality explosion is covered in the cost-control post .

PromQL: the queries you’ll actually use

Request rate (per second over 5 min):

rate(http_requests_total[5m])

Error rate as a fraction of total:

sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

=~ is regex match.

P95 latency:

histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

The sum by (le) aggregates across all instances/methods/paths to one histogram, then computes p95.

Request rate per endpoint:

sum by (path) (rate(http_requests_total[5m]))

Current memory usage per instance:

process_resident_memory_bytes

Top 5 endpoints by traffic:

topk(5, sum by (path) (rate(http_requests_total[5m])))

CPU usage rate:

rate(process_cpu_seconds_total[5m])

These eight patterns cover ~80% of real PromQL.

Scrape config

prometheus.yml config:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'api'
    static_configs:
      - targets:
          - api-1:8080
          - api-2:8080
        labels:
          service: 'api'
          env: 'production'

  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

rule_files:
  - /etc/prometheus/rules/*.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

Three jobs: your app (assuming it exposes /metrics), postgres-exporter (for DB metrics), node-exporter (for host machine metrics). Each scraped every 15s.

In Kubernetes you’d use kubernetes_sd_configs for service discovery; for Compose, static targets are fine.

Recording rules

Pre-compute expensive queries:

# rules/recording.yml
groups:
  - name: api_recording
    interval: 30s
    rules:
      - record: api:request_rate_5m
        expr: sum by (status, path) (rate(http_requests_total[5m]))

      - record: api:error_rate_5m
        expr: |
          sum by (path) (rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum by (path) (rate(http_requests_total[5m]))

The expensive multi-aggregation runs every 30s. Dashboards query the recorded series (cheap), not the raw data (expensive).

For dashboards loaded by hundreds of users, recording rules save real Prometheus CPU.

Alerting rules

groups:
  - name: api_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m]))
          > 0.05
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "API error rate > 5% for 2 minutes"
          description: "{{ $value | humanizePercentage }} error rate"

Fires when the expression returns true for 2 minutes. Alertmanager routes it. Covered Sep 14.

Querying

Three ways:

Web UI: http://prometheus:9090/graph. Type query; see results. Good for ad-hoc.
Grafana: dashboards execute PromQL against the configured data source.
HTTP API: curl 'http://prometheus:9090/api/v1/query?query=...'. For scripts.

Common Pitfalls

Plotting a Counter directly. Just goes up. Always wrap in rate().

High-cardinality labels. User IDs, request IDs, anything unbounded. Don’t.

Scrape interval too tight. 1s interval = 15× more data than 15s. Disk usage matters. Default 15s is right for most apps.

Histogram buckets misconfigured. Default histogram buckets are exponential and fine for HTTP latencies. For other distributions, override.

avg() instead of histogram_quantile(). Average latency hides tail. P95 and P99 are what matter.

No for: on alerts. Flapping. Always require condition to persist.

Wrapping Up

Pull-based metrics, four types, label-driven queries, PromQL with rate() and histogram_quantile(). Wednesday: instrumenting Go for Prometheus .

How Prometheus works

Four metric types

Labels — the multi-dimensional bit

PromQL: the queries you’ll actually use

Scrape config

Recording rules

Alerting rules

Querying

Common Pitfalls

Wrapping Up

Related posts

September Retro, One Stack to Watch Them All

Prometheus Cardinality and Cost Control

Alerting with Prometheus Alertmanager

Instrumenting Node.js Services for Prometheus

Instrumenting Go Services for Prometheus

Building an Observability Stack in 2022

Monitoring n8n in Production

Rust Service Observability in 2024, Metrics, Logs, and Traces That Help

Let’s Start a Project