Instrumenting Node.js Services for Prometheus
TL;DR —
prom-clientis the de facto Prometheus library for Node. Register a default registry, instrument HTTP via middleware, expose/metrics. Node-specific gotchas: event loop lag, V8 heap stats, GC pause times — instrument them explicitly.
After Go instrumentation, the Node version. Patterns translate; ergonomics differ.
Install
npm install prom-client express-prom-bundle
prom-client is the underlying lib. express-prom-bundle is a thin middleware wrapper for Express. Equivalent wrappers exist for Fastify.
Basic setup
import express from 'express';
import promClient from 'prom-client';
const app = express();
// Default metrics: CPU, memory, event loop, GC
promClient.collectDefaultMetrics({ prefix: 'node_' });
// Custom metric
const httpRequests = new promClient.Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'path', 'status'],
});
const httpDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'path'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
});
// Middleware
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const route = req.route?.path || 'unknown';
httpRequests.inc({ method: req.method, path: route, status: res.statusCode });
httpDuration.observe({ method: req.method, path: route }, duration);
});
next();
});
app.get('/users/:id', (req, res) => res.json({ id: req.params.id }));
app.get('/metrics', async (req, res) => {
res.set('Content-Type', promClient.register.contentType);
res.send(await promClient.register.metrics());
});
app.listen(8080);
Route normalization is again critical. req.route.path gives /users/:id not /users/42. Same cardinality problem as Go; same fix.
Using express-prom-bundle (less code)
import promBundle from 'express-prom-bundle';
const metricsMiddleware = promBundle({
includeMethod: true,
includePath: true,
includeStatusCode: true,
normalizePath: [
['^/users/[^/]+', '/users/:id'],
['^/orders/[^/]+', '/orders/:id'],
],
});
app.use(metricsMiddleware);
Handles the middleware + path normalization + /metrics endpoint. ~5 lines for full HTTP instrumentation.
Node-specific metrics worth knowing
collectDefaultMetrics exposes:
process_cpu_user_seconds_total— CPU timeprocess_resident_memory_bytes— RSSnodejs_heap_size_used_bytes— V8 heapnodejs_eventloop_lag_seconds— event loop lag (key health signal)nodejs_gc_duration_seconds— GC pausesnodejs_active_handles— open file handles, socketsnodejs_active_requests— pending I/O operations
The event loop lag metric is the most diagnostic. A healthy Node service has < 10ms. > 100ms = saturated.
Business metrics
const ordersCreated = new promClient.Counter({
name: 'orders_created_total',
help: 'Orders created',
labelNames: ['product'],
});
const queueDepth = new promClient.Gauge({
name: 'queue_depth',
help: 'Pending items in queue',
labelNames: ['queue'],
});
// Periodically update gauge
setInterval(() => {
queueDepth.set({ queue: 'email' }, getEmailQueueDepth());
}, 5000);
// In business code
ordersCreated.inc({ product: 'pro_monthly' });
Same shape as Go. Slightly less verbose.
Multi-process clustering
If you run Node via cluster or PM2 with multiple workers, each worker has its own metrics. Prometheus scrapes one endpoint and sees one worker’s view.
Two patterns:
Per-worker scrape: each worker listens on a different port; Prometheus scrapes all. Simple; many endpoints to manage.
Aggregated: use prom-client’s AggregatorRegistry. Master process aggregates worker metrics; one endpoint exposes the sum.
import cluster from 'cluster';
import { AggregatorRegistry } from 'prom-client';
if (cluster.isPrimary) {
for (let i = 0; i < os.cpus().length; i++) cluster.fork();
const aggregator = new AggregatorRegistry();
const metricsServer = express();
metricsServer.get('/metrics', async (req, res) => {
res.set('Content-Type', aggregator.contentType);
res.send(await aggregator.clusterMetrics());
});
metricsServer.listen(9100);
} else {
// worker code
}
For most modern deployments (one Node process per container with Kubernetes scaling), worker clustering is rare. K8s manages replicas; Prometheus scrapes each pod individually.
Fastify variant
import Fastify from 'fastify';
import promClient from 'prom-client';
const fastify = Fastify();
promClient.collectDefaultMetrics({ prefix: 'node_' });
const httpRequests = new promClient.Counter({ ... });
fastify.addHook('onResponse', (req, reply, done) => {
httpRequests.inc({
method: req.method,
path: req.routerPath || 'unknown',
status: reply.statusCode,
});
done();
});
fastify.get('/metrics', async (req, reply) => {
reply.type(promClient.register.contentType);
return promClient.register.metrics();
});
Same patterns. Use req.routerPath for route normalization.
Common Pitfalls
Path labels with raw URL. /users/42 cardinality bomb. Use route patterns.
Forgetting await on register.metrics(). Returns a Promise; serializing it sends “[object Promise]”. Always await.
Multi-worker without aggregation. Each worker reports its own counters; Prometheus only sees one. Either scrape all workers or use AggregatorRegistry.
Metric names with hyphens. Prometheus expects [a-zA-Z_:][a-zA-Z0-9_:]*. Use underscores.
Default metrics with no prefix. Collides with metrics from other services if you scrape both. Use a prefix.
No event-loop lag metric watched. Node services degrade quietly when event loop saturates. Set an alert on nodejs_eventloop_lag_seconds > 0.1.
Wrapping Up
Node + prom-client = ~20 lines for comprehensive instrumentation. Watch the event-loop lag metric. Monday: Grafana dashboards that actually help.