background-shape
Falco 0.35 in Production, Runtime Detection Without the Alert Fatigue
September 18, 2023 · 6 min read · by Muhammad Amal programming

TL;DR — Use the modern eBPF driver (Falco 0.35 default on kernels 5.8+); the kmod driver is legacy / The default ruleset is too noisy for prod — disable or scope the macros that flag normal sysadmin behaviour / Route alerts through Falcosidekick to Slack for triage and to your SIEM for retention; do not pipe stdout directly anywhere.

A scanner tells you what is in your image. An admission policy tells you what gets in. Neither tells you what is actually happening inside the container right now. That is the runtime detection problem, and Falco is the open-source answer I trust to solve it.

The reputation Falco has for noise is partly deserved and partly outdated. On a default install, yes, you will get an alert every time someone exec’s into a pod to debug. With a few hours of rule curation and the modern eBPF driver, the signal-to-noise ratio is good enough to wake up a human only when something genuinely matters. Here is what that setup looks like in 2023.

Driver Choice Matters

Falco needs to see syscalls. It has three drivers.

Kernel module (kmod). The original. Requires building or pulling a matching .ko for every kernel. Reliable when it works. Painful on managed clusters where kernel upgrades happen without warning.

Legacy eBPF. A CO-RE-ish probe loaded into the kernel. Better than kmod for portability but has known issues with newer kernels.

Modern eBPF. Introduced in Falco 0.34, default in 0.35 on supported kernels (5.8+). Built on libbpf and CO-RE, no per-kernel artefacts. This is what you want.

# values.yaml for falcosecurity/falco Helm chart
driver:
  kind: modern-bpf
falco:
  json_output: true
  json_include_output_property: true
  http_output:
    enabled: true
    url: http://falcosidekick:2801

The chart version I am running is falco-3.8.4 which packages Falco 0.35.1. Pin both.

The Default Ruleset and What to Turn Off

Falco ships with falco_rules.yaml and k8s_audit_rules.yaml. They are reasonable starting points and aggressively unsuitable for production as-is. The first hour after install is figuring out which built-in rules to disable, scope, or override.

The biggest offenders in a typical cluster:

Terminal shell in container. Triggers every time anyone exec’s into a pod. In an environment with active SRE, this is constant. Either disable, or scope to only fire when the executing identity is not in your approved SRE group (which Falco cannot know — you would push this to the SIEM instead).

Write below etc / binary dir. Fires on legitimate package installs during image init, sidecar bootstrap, and a surprising number of helm hooks. Scope to running containers, not init.

Read sensitive file untrusted. Fires on monitoring agents reading /etc/shadow. Add an exception for your monitoring DaemonSet’s service account.

Outbound connection to non-RFC1918. Fires on every legitimate external API call. This rule needs a custom allowlist or it is pure noise.

Here is the override file I keep as a baseline for new clusters.

- macro: user_known_terminal_shell_in_container_conditions
  condition: >
    (k8s.ns.name in (kube-system, monitoring) and
     proc.pname in (kubelet, containerd-shim, runc))

- macro: user_known_write_below_etc_activities
  condition: >
    (container.image.repository startswith "ghcr.io/myorg/init-")

- list: user_known_sensitive_files_readers
  items: ["/usr/bin/node_exporter", "/usr/bin/falco"]

- rule: Outbound Connection to C2 Server
  desc: Outbound to known C2 IPs from threat feed
  condition: >
    evt.type=connect and evt.dir=< and fd.sip in (c2_ips)
  output: >
    Outbound to C2 IP (user=%user.name container=%container.id
    image=%container.image.repository connection=%fd.name)
  priority: CRITICAL
  source: syscall
  tags: [network, mitre_command_and_control]

The c2_ips list gets populated by a sidecar that pulls a threat-intel feed. That is the kind of rule that justifies a runtime detector — generic enough to write once, specific enough not to fire on legitimate traffic.

Routing Output Through Falcosidekick

Falco emits JSON to stdout, an HTTP endpoint, or a few built-in outputs. None of those are where you want alerts to land. Falcosidekick is the fan-out service: take Falco’s stream, route by priority and rule, send to multiple destinations with format conversion.

My standard config:

# falcosidekick values
config:
  slack:
    webhookurl: ${SLACK_WEBHOOK}
    minimumpriority: warning
    channel: "#sec-alerts"
  elasticsearch:
    hostport: http://elastic:9200
    index: falco
    minimumpriority: debug
  pagerduty:
    routingkey: ${PD_KEY}
    minimumpriority: critical
  policyreport:
    enabled: true

Three destinations, three thresholds. Slack for triage at warning and above. Elasticsearch for retention and forensic search at all levels. PagerDuty only for critical. The PolicyReport output is a nice touch — Falco events show up as Kubernetes PolicyReport resources that other tools (Kyverno, policy dashboards) can consume natively.

The pattern I will not budge on: never PagerDuty straight from a generic rule. Only rules I have specifically validated as low-false-positive go to PD. Everything else is Slack at most.

What to Detect, Practically

The rule categories that have actually paid off in incidents I have been involved with:

Crypto miners. xmrig and friends have distinctive process names, library loads, and CPU patterns. The built-in Detect crypto miners using the Stratum protocol rule catches the network side. Add a process-name match for xmrig, kinsing, kdevtmpfsi (the three I have actually seen in the wild).

Reverse shells. A process whose stdin/stdout is a network socket is rarely legitimate. Falco’s Reverse Shell rule catches the common cases.

Privilege escalation inside the container. setuid calls from non-root processes, or unexpected su/sudo from a webserver UID, are interesting.

Drift detection. A binary executed from a path that did not exist in the container image. This requires baseline state, which Falco does not do natively but Tetragon does. Mentioning it because Tetragon is increasingly worth a look alongside Falco for behavioural detection — different model, complementary coverage.

Mount of sensitive host paths. A pod mounting /var/run/docker.sock or /etc from the host is either a Falco DaemonSet (legitimate, exempt by name) or trouble.

Common Pitfalls

Running Falco on every node without limits. Falco runs as a privileged DaemonSet. Set CPU and memory requests/limits. I see 200m CPU and 1Gi memory on busy nodes with the modern eBPF driver. Bursting will happen during process-fork storms.

Dropping events. If Falco cannot keep up with the syscall stream, it drops events and logs a warning. By default the buffer is small. The chart’s falco.syscall_event_drops.threshold defaults to .1 (10%) before it alerts. Watch that metric; if you see drops, raise the buffer size or scope rules to reduce the syscall surface.

Not testing rules. falco --dry-run -r myrule.yaml validates syntax but not behaviour. The falco-tests repo has a harness, but the practical answer is to write a small test script that performs the action and confirms the alert fires. Without this, rules silently break after Falco upgrades.

k8s audit rules without k8s audit logs. The k8s_audit_rules.yaml set requires you to wire your Kubernetes API server’s audit log to Falco. If you skip the wiring, those rules never fire and you do not get an error — they just never see input. Either disable that ruleset or wire the audit webhook.

No retention for forensics. Slack is not forensics storage. The point of sending to Elasticsearch or S3 is so that, three months later, you can answer “did this rule ever fire before today”. If you only have Slack, you cannot.

Wrapping Up

Falco gets dismissed because of the default ruleset, which is unfair to the tool. The work is rule curation, not installation, and after a week of tuning the alert volume is manageable. Once runtime detection is in place, the natural next layer is preventing the misconfiguration that makes runtime alerts necessary in the first place — which is the job of OPA and Gatekeeper at admission time.