Instrumenting Node.js Services for Prometheus

Instrumenting Node.js Services for Prometheus

September 9, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — prom-client is the de facto Prometheus library for Node. Register a default registry, instrument HTTP via middleware, expose /metrics. Node-specific gotchas: event loop lag, V8 heap stats, GC pause times — instrument them explicitly.

After Go instrumentation, the Node version. Patterns translate; ergonomics differ.

Install

npm install prom-client express-prom-bundle

prom-client is the underlying lib. express-prom-bundle is a thin middleware wrapper for Express. Equivalent wrappers exist for Fastify.

Basic setup

import express from 'express';
import promClient from 'prom-client';

const app = express();

// Default metrics: CPU, memory, event loop, GC
promClient.collectDefaultMetrics({ prefix: 'node_' });

// Custom metric
const httpRequests = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'path', 'status'],
});

const httpDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'path'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
});

// Middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const route = req.route?.path || 'unknown';
    httpRequests.inc({ method: req.method, path: route, status: res.statusCode });
    httpDuration.observe({ method: req.method, path: route }, duration);
  });
  next();
});

app.get('/users/:id', (req, res) => res.json({ id: req.params.id }));

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', promClient.register.contentType);
  res.send(await promClient.register.metrics());
});

app.listen(8080);

Route normalization is again critical. req.route.path gives /users/:id not /users/42. Same cardinality problem as Go; same fix.

Using express-prom-bundle (less code)

import promBundle from 'express-prom-bundle';

const metricsMiddleware = promBundle({
  includeMethod: true,
  includePath: true,
  includeStatusCode: true,
  normalizePath: [
    ['^/users/[^/]+', '/users/:id'],
    ['^/orders/[^/]+', '/orders/:id'],
  ],
});

app.use(metricsMiddleware);

Handles the middleware + path normalization + /metrics endpoint. ~5 lines for full HTTP instrumentation.

Node-specific metrics worth knowing

collectDefaultMetrics exposes:

process_cpu_user_seconds_total — CPU time
process_resident_memory_bytes — RSS
nodejs_heap_size_used_bytes — V8 heap
nodejs_eventloop_lag_seconds — event loop lag (key health signal)
nodejs_gc_duration_seconds — GC pauses
nodejs_active_handles — open file handles, sockets
nodejs_active_requests — pending I/O operations

The event loop lag metric is the most diagnostic. A healthy Node service has < 10ms. > 100ms = saturated.

Business metrics

const ordersCreated = new promClient.Counter({
  name: 'orders_created_total',
  help: 'Orders created',
  labelNames: ['product'],
});

const queueDepth = new promClient.Gauge({
  name: 'queue_depth',
  help: 'Pending items in queue',
  labelNames: ['queue'],
});

// Periodically update gauge
setInterval(() => {
  queueDepth.set({ queue: 'email' }, getEmailQueueDepth());
}, 5000);

// In business code
ordersCreated.inc({ product: 'pro_monthly' });

Same shape as Go. Slightly less verbose.

Multi-process clustering

If you run Node via cluster or PM2 with multiple workers, each worker has its own metrics. Prometheus scrapes one endpoint and sees one worker’s view.

Two patterns:

Per-worker scrape: each worker listens on a different port; Prometheus scrapes all. Simple; many endpoints to manage.

Aggregated: use prom-client’s AggregatorRegistry. Master process aggregates worker metrics; one endpoint exposes the sum.

import cluster from 'cluster';
import { AggregatorRegistry } from 'prom-client';

if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) cluster.fork();

  const aggregator = new AggregatorRegistry();
  const metricsServer = express();
  metricsServer.get('/metrics', async (req, res) => {
    res.set('Content-Type', aggregator.contentType);
    res.send(await aggregator.clusterMetrics());
  });
  metricsServer.listen(9100);
} else {
  // worker code
}

For most modern deployments (one Node process per container with Kubernetes scaling), worker clustering is rare. K8s manages replicas; Prometheus scrapes each pod individually.

Fastify variant

import Fastify from 'fastify';
import promClient from 'prom-client';

const fastify = Fastify();

promClient.collectDefaultMetrics({ prefix: 'node_' });

const httpRequests = new promClient.Counter({ ... });

fastify.addHook('onResponse', (req, reply, done) => {
  httpRequests.inc({
    method: req.method,
    path: req.routerPath || 'unknown',
    status: reply.statusCode,
  });
  done();
});

fastify.get('/metrics', async (req, reply) => {
  reply.type(promClient.register.contentType);
  return promClient.register.metrics();
});

Same patterns. Use req.routerPath for route normalization.

Common Pitfalls

Path labels with raw URL. /users/42 cardinality bomb. Use route patterns.

Forgetting await on register.metrics(). Returns a Promise; serializing it sends “[object Promise]”. Always await.

Multi-worker without aggregation. Each worker reports its own counters; Prometheus only sees one. Either scrape all workers or use AggregatorRegistry.

Metric names with hyphens. Prometheus expects [a-zA-Z_:][a-zA-Z0-9_:]*. Use underscores.

Default metrics with no prefix. Collides with metrics from other services if you scrape both. Use a prefix.

No event-loop lag metric watched. Node services degrade quietly when event loop saturates. Set an alert on nodejs_eventloop_lag_seconds > 0.1.

Wrapping Up

Node + prom-client = ~20 lines for comprehensive instrumentation. Watch the event-loop lag metric. Monday: Grafana dashboards that actually help.

Install

Basic setup

Using express-prom-bundle (less code)

Node-specific metrics worth knowing

Business metrics

Multi-process clustering

Fastify variant

Common Pitfalls

Wrapping Up

Related posts

September Retro, One Stack to Watch Them All

Prometheus Cardinality and Cost Control

Alerting with Prometheus Alertmanager

Instrumenting Go Services for Prometheus

Prometheus 101, Metrics, Scraping, and PromQL

Building an Observability Stack in 2022

Monitoring n8n in Production

Rust Service Observability in 2024, Metrics, Logs, and Traces That Help

Let’s Start a Project