Instrumenting Go Services for Prometheus
TL;DR —
prometheus/client_golangpackage. Register metrics globally, increment counters / observe histograms in handlers, expose/metricsendpoint. HTTP middleware auto-instruments. Defaultgo_*runtime metrics included free.
After Prometheus basics, how to actually get a Go service to expose metrics.
Setup
github.com/prometheus/client_golang v1.13.0
A Counter and a Histogram
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
httpRequests = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"method", "path", "status"},
)
httpDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "path"},
)
)
func init() {
prometheus.MustRegister(httpRequests, httpDuration)
}
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/users", handleUsers)
mux.Handle("/metrics", promhttp.Handler())
instrumented := instrument(mux)
http.ListenAndServe(":8080", instrumented)
}
func instrument(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
sw := &statusWriter{ResponseWriter: w, status: 200}
next.ServeHTTP(sw, r)
duration := time.Since(start).Seconds()
httpRequests.WithLabelValues(r.Method, r.URL.Path, fmt.Sprint(sw.status)).Inc()
httpDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
})
}
type statusWriter struct {
http.ResponseWriter
status int
}
func (sw *statusWriter) WriteHeader(code int) {
sw.status = code
sw.ResponseWriter.WriteHeader(code)
}
Three things:
- Define metrics globally; register in init
- Middleware records on every request
- Expose
/metricsviapromhttp.Handler()
After this, curl localhost:8080/metrics shows your metrics + Go runtime metrics.
Path normalization — critical
The naive middleware above uses r.URL.Path as a label value. /users/42 and /users/43 create different series. With 1M user requests = 1M series = Prometheus dies.
Fix: use the route pattern, not the URL. With chi:
import "github.com/go-chi/chi/v5"
func instrument(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
sw := &statusWriter{ResponseWriter: w, status: 200}
next.ServeHTTP(sw, r)
duration := time.Since(start).Seconds()
// chi's route pattern, e.g., "/users/{id}"
routePattern := chi.RouteContext(r.Context()).RoutePattern()
if routePattern == "" { routePattern = "unknown" }
httpRequests.WithLabelValues(r.Method, routePattern, fmt.Sprint(sw.status)).Inc()
httpDuration.WithLabelValues(r.Method, routePattern).Observe(duration)
})
}
Now /users/{id} is one series for all users.
Custom histogram buckets
Default buckets are tuned for HTTP latencies (5ms to 10s). For other distributions, override:
httpDuration := prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration",
Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10},
},
[]string{"method", "path"},
)
Or use prometheus.ExponentialBuckets(0.001, 2, 15) for an exponential range.
Buckets are tradeoffs: more buckets = finer percentile resolution + more storage + more bytes in scrape. 10-15 buckets is typical.
Application-specific metrics
Business metrics, not just HTTP:
ordersCreated := prometheus.NewCounterVec(
prometheus.CounterOpts{Name: "orders_created_total", Help: "Orders created"},
[]string{"product"},
)
queueDepth := prometheus.NewGaugeVec(
prometheus.GaugeOpts{Name: "queue_depth", Help: "Pending items in queue"},
[]string{"queue"},
)
paymentAmount := prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "payment_amount_cents",
Help: "Payment amounts processed",
Buckets: []float64{100, 500, 1000, 5000, 10000, 50000, 100000},
},
[]string{"currency"},
)
In your business logic:
ordersCreated.WithLabelValues(productID).Inc()
queueDepth.WithLabelValues("email").Set(float64(len(emailQueue)))
paymentAmount.WithLabelValues("USD").Observe(float64(amountCents))
These business-level metrics are often more useful than infrastructure ones. “Orders/min” is more actionable than “CPU%”.
Default Go runtime metrics
prometheus/client_golang registers go_* metrics automatically:
go_goroutines— current goroutine countgo_memstats_alloc_bytes— currently allocated heapgo_gc_duration_seconds— GC pause timesgo_threads— OS threads in use
Plus process-level:
process_cpu_seconds_totalprocess_resident_memory_bytesprocess_open_fds
All free. Useful for debugging “why is this service slow?”
Using gauges for status
buildInfo := prometheus.NewGaugeVec(
prometheus.GaugeOpts{Name: "build_info", Help: "Service build info"},
[]string{"version", "commit"},
)
buildInfo.WithLabelValues(Version, Commit).Set(1)
Set once at boot. Gives you a way to query “which version is running where?” via build_info{job="api"} in Grafana.
Metric naming conventions
Follow Prometheus naming:
- Lowercase, snake_case
- Units in name:
_seconds,_bytes,_total(for counters) - Prefix with the namespace if helpful:
myapp_requests_total
http_requests_total, cache_hits_total, db_query_duration_seconds. Consistent naming makes dashboards reusable.
Common Pitfalls
Path labels with full URL. Cardinality explosion. Use route patterns.
User ID labels. Same. Bound your label values.
Forgetting to register. Metric exists in code but doesn’t show in /metrics. Always MustRegister.
Bucket count too high. 50-bucket histograms inflate cardinality and scrape size. 10-15 buckets typical.
Mutating metric names. Renaming http_requests_total to api_requests_total breaks every dashboard. Don’t rename casually.
Direct increment without label values. httpRequests.Inc() without WithLabelValues() panics. Always specify labels.
Process per request creates new metrics. Forking workers each register their own metric vector → duplicates. Register once at startup.
Wrapping Up
client_golang + HTTP middleware + business metrics = comprehensive instrumentation in ~50 lines. Friday: the same for Node.