Running IIoT Workloads on K3s at the Edge, A Field Pattern
TL;DR — K3s is the right answer for “Kubernetes at the edge” in 2023 if your gateways have 2+ cores and 4+ GB RAM / Run a tiny Mosquitto, a local Telegraf, and an outbound MQTT bridge on every gateway / GitOps with Argo CD is non-negotiable when you’re managing 50+ sites.
Edge compute for IIoT is one of those topics where the marketing got way ahead of what actually works in a plant with three-year-old gateways and a flaky 4G connection. After running K3s across a couple hundred industrial sites this year, I have a strong opinion: keep it boring, keep it tiny, and never let the cluster control plane depend on the WAN.
This post is the pattern I run today — what’s on every gateway, how it’s deployed, and where the bodies are buried. It’s the next layer down from the broker and storage posts. If you want broker context first, the Mosquitto tuning post covers what runs centrally.
What Belongs at the Edge
Three things, in order of impact:
- Local broker and store-and-forward. When the WAN drops, sensors keep publishing locally, and a bridge replays upstream when connectivity returns.
- Preprocessing and filtering. Drop quality-bad readings at the edge, downsample high-frequency streams, run threshold logic without a round trip.
- Local control loops. When response time matters more than central visibility — emergency shutoffs, PID loops, safety interlocks.
Things that don’t belong at the edge unless you have a strong reason:
- Long-term storage. The gateway disk fills, the disk fails, the data is gone. Forward to the central time-series DB.
- ML inference training. Inference, sure. Training, almost never.
- Complex orchestration. If your edge needs more than a handful of pods, you’re doing too much.
The Stack on Every Gateway
On each edge node I run K3s as a single-node “cluster” (or HA pair for safety-critical sites), with this workload set:
# edge-stack.yaml — abridged, applied per-site by Argo CD
apiVersion: v1
kind: Namespace
metadata:
name: iot-edge
---
# Local MQTT broker
apiVersion: apps/v1
kind: Deployment
metadata:
name: mosquitto
namespace: iot-edge
spec:
replicas: 1
strategy: { type: Recreate }
selector: { matchLabels: { app: mosquitto } }
template:
metadata: { labels: { app: mosquitto } }
spec:
containers:
- name: mosquitto
image: eclipse-mosquitto:2.0.18
ports:
- { name: mqtt, containerPort: 1883 }
resources:
requests: { cpu: 100m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumeMounts:
- { name: config, mountPath: /mosquitto/config }
- { name: data, mountPath: /mosquitto/data }
volumes:
- name: config
configMap: { name: mosquitto-config }
- name: data
persistentVolumeClaim: { claimName: mosquitto-data }
---
# Local Telegraf - reads local broker, forwards filtered data upstream
apiVersion: apps/v1
kind: Deployment
metadata:
name: telegraf
namespace: iot-edge
spec:
replicas: 1
selector: { matchLabels: { app: telegraf } }
template:
metadata: { labels: { app: telegraf } }
spec:
containers:
- name: telegraf
image: telegraf:1.27.4
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }
volumeMounts:
- { name: config, mountPath: /etc/telegraf, readOnly: true }
env:
- name: SITE_ID
valueFrom: { configMapKeyRef: { name: site-info, key: site_id } }
volumes:
- name: config
configMap: { name: telegraf-config }
The Mosquitto bridge config (in the ConfigMap referenced above) handles store-and-forward:
# /mosquitto/config/bridge.conf
connection upstream
address hub.iot.example.com:8883
bridge_protocol_version mqttv50
bridge_attempt_unsubscribe false
notifications false
try_private false
# Sensor data flows out
topic plant/+/data/# out 1 "" hub/${SITE_ID}/
# Commands flow in
topic hub/${SITE_ID}/cmd/# in 1 "" cmd/
# Critical: bounded queue, drop oldest on overflow
queue_qos0_messages false
max_queued_messages 100000
# TLS to hub
bridge_cafile /mosquitto/certs/ca.crt
bridge_certfile /mosquitto/certs/client.crt
bridge_keyfile /mosquitto/certs/client.key
bridge_insecure false
max_queued_messages 100000 at the edge is the WAN-outage budget. At 30 msg/sec from this gateway, that’s an hour of buffer. Beyond that, oldest drops. The alternative — unbounded queue — fills the disk during an extended outage and the gateway becomes a brick.
Why K3s and Not Just Docker Compose
I shipped Docker Compose at the edge for two years and migrated to K3s in 2022. The reasons:
- GitOps. Argo CD pointing at a per-site overlay manages 200 sites without me touching each gateway.
- Workload health. Liveness/readiness probes, restart policies, resource limits — all the things Compose has approximations of but K8s does seriously.
- Local container registry mirror. K3s ships with
containerdand integrates cleanly with a registry mirror — important when your gateway has a 10 Mbps uplink and you don’t want to pull images from the public registry on every restart. - Single binary install. A 60 MB binary,
curl | shinstall, and you have a working cluster. Compose still requires Docker, which means managing two things.
K3s with embedded SQLite for single-node sites is fine. For HA pairs at safety-critical sites, embedded etcd or external Postgres. The K3s docs cover the trade-offs.
GitOps with Argo CD: One Cluster Per Plant
Central management cluster runs Argo CD 2.8. Each edge K3s registers as a managed cluster. Per-site overlays in Git look like:
clusters/
plant-east-01/
kustomization.yaml
site-info.yaml # SITE_ID, region, machine list
secrets/ # sealed-secrets, sealed against this site's key
plant-east-02/
...
base/
mosquitto/
telegraf/
node-exporter/
The ApplicationSet then generates one Argo Application per directory:
# argo-edge-fleet.yaml — Argo CD 2.8
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: edge-fleet
namespace: argocd
spec:
generators:
- git:
repoURL: git@github.com:example/iot-edge-config.git
revision: main
directories:
- path: clusters/*
template:
metadata:
name: '{{path.basename}}'
spec:
project: edge
source:
repoURL: git@github.com:example/iot-edge-config.git
targetRevision: main
path: '{{path}}'
destination:
name: '{{path.basename}}'
namespace: iot-edge
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 10
backoff:
duration: 30s
factor: 2
maxDuration: 30m
Sealed Secrets per site is the part nobody likes setting up but everyone is glad they did once a gateway is decommissioned and you need to revoke its credentials cleanly.
Connectivity: Plan for the WAN to Be Bad
The single biggest mistake I see is treating the WAN as reliable. It isn’t. Plan for:
- Outage from minutes to days. Local buffer for at least your worst observed outage, with bounded drop behavior.
- High latency. A 500ms RTT between edge and hub is not unusual on cellular. MQTT 5 keepalives and TCP keepalives must accommodate.
- Backhaul cost. Cellular data costs real money. Filter at the edge, compress payloads (CBOR or Protobuf over JSON for high-rate streams), batch where you can.
Wireguard between the edge and the management VPC for SSH and Argo CD traffic is non-negotiable. Don’t expose the K3s API on the public internet, ever.
Observability for Disconnected Sites
You can’t scrape Prometheus across the WAN reliably. The pattern that works: run a Prometheus Agent at each site, remote-write to a central Mimir or Cortex, with retries and a local buffer.
# Per-site prometheus-agent config
remote_write:
- url: https://mimir.example.com/api/v1/push
queue_config:
capacity: 100000
max_shards: 10
max_samples_per_send: 2000
batch_send_deadline: 5s
retry_on_http_429: true
metadata_config:
send: true
write_relabel_configs:
- source_labels: [__name__]
regex: 'go_gc_.*|process_.*'
action: drop
Drop the noisy Go internals at the agent — you don’t want them eating your backhaul budget.
Common Pitfalls
- Updating K3s in place over a slow link. K3s upgrades are reasonably safe but the image pull is large. Pre-stage upgrades through a local registry mirror, or use the K3s
system-upgrade-controllerwith bandwidth caps. - PVCs on the K3s default
local-pathprovisioner with no backup. That data is on one gateway, one disk, one set of failure modes. Either accept that edge state is ephemeral or take periodic backups upstream. - Forgetting NTP at the edge. Sensor timestamps are wrong, your downstream queries are wrong, your aggregations are wrong. Run
chronyagainst a reliable source on every gateway and alert on drift. - Pulling images from Docker Hub at the edge. Rate limits will bite you the first time a fleet of 200 gateways restarts simultaneously. Mirror your images in a registry you control.
- Running stateful workloads at the edge without a “this is ephemeral” mental model. The gateway will fail, get stolen, get flooded, or get accidentally re-imaged. Anything irreplaceable belongs upstream.
Wrapping Up
Edge compute earns its place when the WAN is unreliable, response time is local, or backhaul cost matters. K3s plus Argo CD gives you a manageable fleet at scale; Mosquitto plus Telegraf gives you the data plane. Resist the urge to put more on the gateway than you have to. Next post I’ll bridge between the OT side of the house and the IT pipeline by getting into OPC UA to MQTT translation.