background-shape
Tuning Eclipse Mosquitto 2.0 for 50k Industrial Sensors on One Box
August 5, 2023 · 6 min read · by Muhammad Amal programming

TL;DR — Mosquitto’s default config is sized for hobby use, not industrial fan-in / The wins come from kernel sockets, TLS session reuse, persistence tuning, and ACL hygiene / On modern hardware a single Mosquitto 2.0.18 node will handle 50k connected sensors with sub-50ms p99 publish latency.

I keep getting pulled into “Mosquitto can’t scale, we need to migrate” conversations that end with us not migrating. The default config ships with conservative limits because it has to run on a Raspberry Pi. If you’re putting it on a real server and not changing those defaults, the broker is going to fall over and people are going to blame the broker.

This is the runbook I hand to teams before they start architecting their way out of a problem they don’t have. We’re targeting roughly 50k concurrent MQTT 5 clients, ~30k msg/sec aggregate publish rate, mostly QoS 1, average payload 256 bytes. That’s an industrial site, not a global cloud, and Mosquitto handles it on a single 8-core box if you let it.

For the broader broker selection rationale, see my MQTT broker comparison from earlier this week.

Step One: Stop Lying to the Kernel

Mosquitto is a single-process broker that uses epoll and a few worker threads for TLS. Most of what’s “wrong” at scale isn’t Mosquitto — it’s the kernel still pretending you’re on a desktop.

# /etc/sysctl.d/99-mosquitto.conf — Linux 5.15
fs.file-max = 2097152
fs.nr_open = 2097152

net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535

net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535

# Big receive/send buffers for many idle conns
net.core.rmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_default = 262144
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Keep connections honest; sensors disappear without TCP RST
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 4

Then in mosquitto.service lift the per-process file descriptor limit:

# /etc/systemd/system/mosquitto.service.d/limits.conf
[Service]
LimitNOFILE=1048576
LimitNPROC=65535

Reload systemd, sysctl -p, and restart. The single most common reason a Mosquitto instance “stops accepting connections” at exactly 1024 clients is that nobody touched LimitNOFILE.

Step Two: Don’t Terminate TLS in Mosquitto

Mosquitto can terminate TLS, and for small deployments it should. At 50k clients, you want HAProxy or Envoy in front for three reasons: session ticket support, ALPN, and the ability to do TLS handshake offload on its own CPU budget.

A minimal HAProxy 2.8 config in front of Mosquitto:

# /etc/haproxy/haproxy.cfg — HAProxy 2.8
global
    maxconn 200000
    tune.ssl.cachesize 200000
    tune.ssl.lifetime 86400
    tune.ssl.default-dh-param 2048
    nbthread 8

defaults
    mode tcp
    timeout connect 5s
    timeout client  4h
    timeout server  4h
    option tcplog

frontend mqtt_tls
    bind :8883 ssl crt /etc/haproxy/certs/mqtt.pem alpn mqtt
    default_backend mqtt_backend

backend mqtt_backend
    balance roundrobin
    option tcp-check
    tcp-check connect port 1883
    server m1 127.0.0.1:1883 check

Mosquitto then binds on localhost:1883 with TLS off. The relevant Mosquitto block:

# /etc/mosquitto/conf.d/listener.conf
listener 1883 127.0.0.1
allow_anonymous false
password_file /etc/mosquitto/passwd
max_connections -1

Note max_connections -1 (unlimited — gated by file descriptors instead). The 4h timeouts on HAProxy match a typical MQTT keepalive of 60s with generous slack so legitimate slow sensors don’t get reaped. The Mosquitto bridge and listener documentation covers the rest of the listener flags worth knowing.

Step Three: Persistence That Doesn’t Block the Event Loop

Mosquitto’s persistence is a periodic snapshot — mosquitto.db is rewritten every autosave_interval seconds. On a busy broker this snapshot can stall the event loop for hundreds of milliseconds. Two things help:

# /etc/mosquitto/conf.d/persistence.conf
persistence true
persistence_location /var/lib/mosquitto/
autosave_interval 1800
autosave_on_changes false

# Limit what gets persisted — QoS 0 should never touch disk
queue_qos0_messages false

# Cap per-client queues
max_queued_messages 1000
max_inflight_messages 100

Put /var/lib/mosquitto on a dedicated NVMe partition with noatime. Don’t put it on an EBS gp2 volume if you’re in AWS — you will see the snapshot stall as a periodic blip in your p99 publish latency. gp3 with provisioned IOPS or io2 is the floor.

If you genuinely need durable, low-latency offline message buffering for command/control to roaming sensors, that’s the signal to move to EMQX. Mosquitto’s persistence is fine for a working set, not for a long tail.

Step Four: ACLs Are a Performance Feature

A wildcard # subscription from a misconfigured client will fan out every message on the bus to that one socket. On a 30k msg/sec broker that’s a self-DoS in seconds.

# /etc/mosquitto/acl
# Per-device pattern: each device publishes only to its own subtree
pattern readwrite factory/%c/data/#
pattern readwrite factory/%c/cmd/#

# Dashboards (auth'd as 'dashboard') can read everything but not publish
user dashboard
topic read factory/#

# Operators publish commands
user operator
topic write factory/+/cmd/#

# Explicitly deny the global wildcard for everyone else
pattern deny #

%c substitutes the client ID, which combined with use_identity_as_username true (when you use client certs) gives you per-device isolation for free. The deny-by-default pattern at the bottom is the bit nobody writes until they’ve had an incident.

Observability: $SYS Topics and Prometheus

Mosquitto exposes runtime stats on $SYS/broker/#. Wire those to Prometheus with mosquitto_exporter (the sapcc/mosquitto-exporter build is the one I use in 2023):

# docker-compose.yml fragment
services:
  mosquitto_exporter:
    image: sapcc/mosquitto-exporter:0.8.0
    environment:
      BROKER_ENDPOINT: tcp://mosquitto:1883
      MQTT_USER: prom_exporter
      MQTT_PASS: ${PROM_PASS}
    ports:
      - "9234:9234"

The metrics worth alerting on:

  • mosquitto_clients_connected plateauing well below your expected fleet size — DNS or auth issue.
  • mosquitto_messages_inflight climbing — slow consumers or backed-up retained queues.
  • mosquitto_load_messages_received_1min divergent from mosquitto_load_messages_sent_1min — fan-out is failing.
  • mosquitto_heap_current_size growing without bound — leaking offline session state, usually from clients that connect with clean_start=false and never come back.

Common Pitfalls

  • clean_start=false and unbounded sessions. Persistent sessions are useful for command/control. For one-way telemetry from devices that may never reconnect, they’re a memory leak. Force clean_start=true at the auth plugin for sensor-class clients.
  • TLS in-process with default ciphers. Mosquitto’s defaults include some ciphers you don’t want anymore. If you’re terminating in Mosquitto, set ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256 and use ECDSA certs — TLS handshake CPU drops by roughly 4x compared to RSA-3072.
  • Logging at notice or higher in prod. Connect/disconnect lines on a 50k-client broker generate gigabytes of logs a day and cost you measurable CPU. Drop to warning and error in steady state; ship the early notice logs to a SIEM during onboarding only.
  • One mosquitto_pub smoke test as a load proxy. It isn’t. Use mqtt-stresser or a small Go program that opens N connections, publishes at jittered intervals, and reports p99 latency. Test with realistic payloads — 64-byte synthetic messages tell you nothing about how 2 KB protobuf payloads will behave.
  • No bridge backpressure plan. If you bridge Mosquitto to a cloud broker and the WAN is flaky, the bridge queue grows until the disk fills. Set bridge_outgoing_retain false and a per-bridge max_queued_messages even if it means dropping data — a full disk is worse than a gap.

Wrapping Up

Mosquitto scales further than its reputation suggests, but only when you treat it as one component in a stack — kernel, TLS terminator, persistent storage, ACLs, observability. Most “we need to migrate” conversations are actually “we never tuned the defaults” conversations. Next post I’ll cover EMQX clustering for the case where you genuinely have outgrown a single-node setup.