background-shape
Instrumenting Node.js Services for Prometheus
September 9, 2022 · 4 min read · by Muhammad Amal programming

TL;DRprom-client is the de facto Prometheus library for Node. Register a default registry, instrument HTTP via middleware, expose /metrics. Node-specific gotchas: event loop lag, V8 heap stats, GC pause times — instrument them explicitly.

After Go instrumentation, the Node version. Patterns translate; ergonomics differ.

Install

npm install prom-client express-prom-bundle

prom-client is the underlying lib. express-prom-bundle is a thin middleware wrapper for Express. Equivalent wrappers exist for Fastify.

Basic setup

import express from 'express';
import promClient from 'prom-client';

const app = express();

// Default metrics: CPU, memory, event loop, GC
promClient.collectDefaultMetrics({ prefix: 'node_' });

// Custom metric
const httpRequests = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'path', 'status'],
});

const httpDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'path'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
});

// Middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const route = req.route?.path || 'unknown';
    httpRequests.inc({ method: req.method, path: route, status: res.statusCode });
    httpDuration.observe({ method: req.method, path: route }, duration);
  });
  next();
});

app.get('/users/:id', (req, res) => res.json({ id: req.params.id }));

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', promClient.register.contentType);
  res.send(await promClient.register.metrics());
});

app.listen(8080);

Route normalization is again critical. req.route.path gives /users/:id not /users/42. Same cardinality problem as Go; same fix.

Using express-prom-bundle (less code)

import promBundle from 'express-prom-bundle';

const metricsMiddleware = promBundle({
  includeMethod: true,
  includePath: true,
  includeStatusCode: true,
  normalizePath: [
    ['^/users/[^/]+', '/users/:id'],
    ['^/orders/[^/]+', '/orders/:id'],
  ],
});

app.use(metricsMiddleware);

Handles the middleware + path normalization + /metrics endpoint. ~5 lines for full HTTP instrumentation.

Node-specific metrics worth knowing

collectDefaultMetrics exposes:

  • process_cpu_user_seconds_total — CPU time
  • process_resident_memory_bytes — RSS
  • nodejs_heap_size_used_bytes — V8 heap
  • nodejs_eventloop_lag_seconds — event loop lag (key health signal)
  • nodejs_gc_duration_seconds — GC pauses
  • nodejs_active_handles — open file handles, sockets
  • nodejs_active_requests — pending I/O operations

The event loop lag metric is the most diagnostic. A healthy Node service has < 10ms. > 100ms = saturated.

Business metrics

const ordersCreated = new promClient.Counter({
  name: 'orders_created_total',
  help: 'Orders created',
  labelNames: ['product'],
});

const queueDepth = new promClient.Gauge({
  name: 'queue_depth',
  help: 'Pending items in queue',
  labelNames: ['queue'],
});

// Periodically update gauge
setInterval(() => {
  queueDepth.set({ queue: 'email' }, getEmailQueueDepth());
}, 5000);

// In business code
ordersCreated.inc({ product: 'pro_monthly' });

Same shape as Go. Slightly less verbose.

Multi-process clustering

If you run Node via cluster or PM2 with multiple workers, each worker has its own metrics. Prometheus scrapes one endpoint and sees one worker’s view.

Two patterns:

Per-worker scrape: each worker listens on a different port; Prometheus scrapes all. Simple; many endpoints to manage.

Aggregated: use prom-client’s AggregatorRegistry. Master process aggregates worker metrics; one endpoint exposes the sum.

import cluster from 'cluster';
import { AggregatorRegistry } from 'prom-client';

if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) cluster.fork();

  const aggregator = new AggregatorRegistry();
  const metricsServer = express();
  metricsServer.get('/metrics', async (req, res) => {
    res.set('Content-Type', aggregator.contentType);
    res.send(await aggregator.clusterMetrics());
  });
  metricsServer.listen(9100);
} else {
  // worker code
}

For most modern deployments (one Node process per container with Kubernetes scaling), worker clustering is rare. K8s manages replicas; Prometheus scrapes each pod individually.

Fastify variant

import Fastify from 'fastify';
import promClient from 'prom-client';

const fastify = Fastify();

promClient.collectDefaultMetrics({ prefix: 'node_' });

const httpRequests = new promClient.Counter({ ... });

fastify.addHook('onResponse', (req, reply, done) => {
  httpRequests.inc({
    method: req.method,
    path: req.routerPath || 'unknown',
    status: reply.statusCode,
  });
  done();
});

fastify.get('/metrics', async (req, reply) => {
  reply.type(promClient.register.contentType);
  return promClient.register.metrics();
});

Same patterns. Use req.routerPath for route normalization.

Common Pitfalls

Path labels with raw URL. /users/42 cardinality bomb. Use route patterns.

Forgetting await on register.metrics(). Returns a Promise; serializing it sends “[object Promise]”. Always await.

Multi-worker without aggregation. Each worker reports its own counters; Prometheus only sees one. Either scrape all workers or use AggregatorRegistry.

Metric names with hyphens. Prometheus expects [a-zA-Z_:][a-zA-Z0-9_:]*. Use underscores.

Default metrics with no prefix. Collides with metrics from other services if you scrape both. Use a prefix.

No event-loop lag metric watched. Node services degrade quietly when event loop saturates. Set an alert on nodejs_eventloop_lag_seconds > 0.1.

Wrapping Up

Node + prom-client = ~20 lines for comprehensive instrumentation. Watch the event-loop lag metric. Monday: Grafana dashboards that actually help.