background-shape
eBPF Plus OpenTelemetry, The Observability Pairing for 2024
June 17, 2024 · 7 min read · by Muhammad Amal programming

TL;DR — eBPF captures kernel-truth signals (syscalls, packets, scheduler events) without touching application code. OpenTelemetry collector 0.100 normalizes and routes them. Together they replace half your monitoring stack and lower the bar for getting useful traces out of legacy services.

For a decade observability meant “instrument your code, ship to a vendor.” Cheap if you started in 2024, expensive if you have ten years of services in five languages. eBPF flipped that. You can now get latency histograms, syscall counts, and connection traces from a kernel module that doesn’t care what your service is written in.

The pairing with OpenTelemetry is what makes it production-grade. eBPF without OTel is a pile of metrics you can’t route. OTel without eBPF is a great pipeline starved of signals for services nobody can instrument. Put them together and you have something that competes with the commercial observability suites at a fraction of the cardinality bill.

This post is the architecture I’d ship today on Kubernetes 1.30. It assumes you’re already a few rungs up the Digital Immune System ladder and want to do observability properly, not just turn on a vendor agent.

What eBPF actually gives you

eBPF programs run in the kernel, triggered by hooks (kprobes, tracepoints, XDP, sockets). For observability the useful surfaces are:

  • Network: packet drops, TCP retransmits, connection latency, DNS errors.
  • Syscall latency: how long read, write, epoll_wait take per process.
  • Scheduling: run-queue latency, CPU steal time.
  • File IO: per-file open/read/write rates.

You get all of this without recompiling or restarting your app. That’s the headline. The catch is that interpreting it requires understanding what your service is doing at the syscall level, which not everyone is comfortable with. But the basic metrics (HTTP latency from the socket layer, retransmits per pod) are usable on day one.

The current dominant deployments are Cilium 1.15 (CNI plus observability via Hubble), Pixie, and the newer Tetragon for security-flavored eBPF. We’ll use Cilium and the OTel collector here because that’s the stack with the smoothest GitOps story.

The architecture

[ kernel events ]
       |  eBPF (Cilium / Tetragon / Pixie)
       v
[ Hubble / DaemonSet exporters ]
       |  OTLP
       v
[ OpenTelemetry collector 0.100 ]
       |  filter, batch, tail-sample
       v
[ Prometheus / Tempo / Loki / vendor ]

The collector is where the real engineering happens. eBPF generates a flood of signals; the collector decides which ones are worth storing.

Cilium Hubble metrics into OTel

If you’re already running Cilium 1.15 as your CNI, Hubble exports flow data and L7 (HTTP/gRPC/DNS) metrics. Wire it into the OTel collector with the Prometheus receiver:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: ebpf-pipeline
  namespace: observability
spec:
  mode: deployment
  config:
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: "hubble"
              kubernetes_sd_configs:
                - role: pod
                  namespaces:
                    names: ["kube-system"]
              relabel_configs:
                - source_labels: [__meta_kubernetes_pod_label_k8s_app]
                  action: keep
                  regex: "hubble"
                - source_labels: [__meta_kubernetes_pod_ip]
                  target_label: __address__
                  replacement: "$1:9965"

    processors:
      batch:
        timeout: 5s
        send_batch_size: 1024
      filter/drop_high_cardinality:
        metrics:
          metric:
            - 'name == "hubble_flows_processed_total" and resource.attributes["k8s.namespace.name"] == "kube-system"'
      attributes/strip_ip:
        actions:
          - key: source.ip
            action: delete
          - key: destination.ip
            action: delete

    exporters:
      prometheusremotewrite:
        endpoint: "http://prometheus.observability:9090/api/v1/write"

    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [filter/drop_high_cardinality, attributes/strip_ip, batch]
          exporters: [prometheusremotewrite]

Two things to call out. First, the filter processor drops noise from the kube-system namespace because nobody wants to alert on kube-proxy’s flow counts. Second, the attributes processor strips raw IPs, which would otherwise blow your label cardinality the first time a pod cycled through 10,000 ephemeral IPs.

Traces from eBPF, not from your code

The interesting one. Pixie and the OTel httpcheckreceiver / new beta ebpf receivers can synthesize HTTP traces from socket-level eBPF without code instrumentation. The traces are coarse-grained (no internal spans) but they’re real, with correct latency.

You won’t see internal spans. You’ll see “ingress span, outbound HTTP to payments span, total latency.” That’s enough to identify a slow downstream without instrumenting a 12-year-old PHP service.

A trace pipeline with tail-sampling:

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow
        type: latency
        latency:
          threshold_ms: 500
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 1

Three policies. Always keep errors. Always keep slow traces. Sample 1% of the rest. This is the cheapest way to keep a useful trace database without paying for everything.

A realistic deployment shape

For a 50-node cluster:

  • Cilium as the CNI (you’d run it for networking anyway).
  • Tetragon or Pixie DaemonSets, optional, for deeper signals.
  • One OTel collector deployment per cluster region (3 replicas), behind a service.
  • DaemonSet collectors only for logs (host journals, kubelet).

Resource costs roughly:

resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "2"
    memory: "2Gi"

That’s per collector replica handling about 10-20k metrics/sec and a few thousand spans/sec. eBPF agents are typically under 100m CPU per node at idle, more if you turn on per-syscall tracing. The Cilium performance docs have the numbers if you want to plan for capacity.

What this replaces

A common stack in 2022 was: a vendor agent on every pod, a sidecar for L7, a separate log shipper, a kube-state-metrics deployment, and a hodgepodge of exporters. eBPF plus OTel collapses most of this:

  • Vendor agent → unnecessary for network and syscall metrics.
  • Sidecar L7 → Cilium Hubble does it at the CNI.
  • Log shipper → OTel collector with filelog receiver.
  • kube-state-metrics → still useful, but you’re not depending on it for everything.

The savings come from cardinality control and fewer agents. The cost is that you now own the pipeline. The collector is your problem.

A migration story

A team I worked with last year had a typical 2022 stack: vendor agent on every pod, sidecar L7 proxy, separate log shipper, and a $40k/month observability bill. The migration to eBPF plus OTel took about six weeks and looked like this:

  • Week 1: Stand up the OTel collector. Mirror metrics from the vendor agent into Prometheus.
  • Week 2: Deploy Cilium 1.15 (replacing the previous CNI in a maintenance window). Enable Hubble metrics.
  • Week 3: Wire Hubble into the collector. Confirm parity with vendor agent for the metrics actually used in alerts.
  • Week 4: Add OTel SDK instrumentation to the top five services for internal spans.
  • Week 5: Roll out the eBPF-based trace synthesis for everything else.
  • Week 6: Decommission the vendor agent on a quarter of the fleet. Measure cost and signal quality.

Outcomes: signal quality improved on three of five “we never had data for that” axes, the bill dropped to about $11k/month, and the team learned more about the kernel in six weeks than they had in the prior three years. Cost wise, it paid back the engineering investment in four months.

The migration wasn’t risk-free. The collector once dropped 10 minutes of metrics during the rollout because we underprovisioned memory. That’s the kind of mistake you make once.

Logs in the same pipeline

Often overlooked: the OTel collector handles logs too. The filelog receiver tails container logs, applies parsers (regex, JSON, multiline), and routes them. No separate Fluent Bit needed.

receivers:
  filelog:
    include:
      - /var/log/pods/*/*/*.log
    start_at: end
    operators:
      - type: container
      - type: regex_parser
        regex: '^(?P<timestamp>\S+)\s+(?P<level>\S+)\s+(?P<message>.*)$'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'

One agent, three signal types (metrics, logs, traces). The reduction in moving parts is real.

Gotchas

  • Cardinality from pod IPs and labels. Strip what you don’t need at the source. Re-adding it downstream is expensive.
  • Kernel version mismatch. eBPF feature support varies by kernel. Cilium 1.15 wants 5.4+. Tetragon wants 5.10+. Check before you upgrade.
  • DNS lookups in eBPF metrics. If your kernel does the resolution differently from your service, you’ll see ghost spans. Pin the resolver behavior.
  • Trace correlation gaps. Without code instrumentation you can’t connect two HTTP requests inside the same process. For internal spans you still need application-level OTel SDK instrumentation.
  • Security review pushback. eBPF runs in the kernel. Some security teams will (reasonably) want a review. Bring documentation, mention that Cilium is a CNCF graduated project, and don’t try to sneak it in.
  • Collector underprovisioned. A starved collector drops data silently. Set up an alert on otelcol_processor_dropped_metric_points.

Wrapping Up

eBPF plus OpenTelemetry is the cheapest, broadest observability you can stand up in 2024. It won’t replace targeted, code-level instrumentation for your hot paths, but it will give you a baseline of kernel-truth signals across every workload, including the ones nobody can or will instrument. Start with Cilium Hubble metrics into an OTel collector, get the cardinality discipline right, then layer in eBPF-derived traces.

Next week I’ll cover service mesh resilience and how Istio Ambient and Linkerd 2.15 compare for the kinds of failure modes that eBPF makes visible. The mesh is what does something about the failures eBPF surfaces.