background-shape
Scaling MQTT Brokers, EMQX and HiveMQ in Production
August 9, 2024 · 6 min read · by Muhammad Amal programming

TL;DR — Most MQTT scaling problems are session, not throughput. Get persistent sessions, retained stores and shared subscriptions right and you’ll handle a surprising amount on modest hardware.

I get asked “which broker scales better, EMQX or HiveMQ” a lot. The honest answer in 2024 is that both scale further than 95 percent of industrial workloads need, and the difference is operational, not architectural. EMQX 5.7 is open source friendly with a strong cluster story on Mria, and HiveMQ 4.28 is commercially polished with first-class Kubernetes operators and an extension SDK that’s hard to beat. I’ve shipped both. Both are good.

What I want to do in this post is share the tuning and topology that actually matters when your fleet grows past a single plant. The throughput numbers in vendor benchmarks are real but they’re not your bottleneck. Sessions are. Retained messages are. ACL evaluation is. Let’s go through each.

A quick stack note. I assume you’re running on Linux, with tcp_keepalive_time lowered, somaxconn raised, and NVMe storage for any persistent session data.

Where Brokers Actually Slow Down

Throughput, in the sense of messages per second per node, is rarely the constraint. A single EMQX node on 8 vCPU comfortably does 100k+ messages per second for telemetry-sized payloads. HiveMQ is in the same range. You hit other limits first.

The real constraints are connection count, session state size, retained message store, and rule engine or extension overhead. Industrial deployments tend to push the first two harder than they push throughput, because devices stay connected for days at a time and carry per-session subscription state.

Connection Count

Each TCP connection costs file descriptors, kernel memory and a chunk of broker memory for the session. EMQX documents around 1 KB per idle session plus subscription state. HiveMQ is similar. A 100k device fleet is therefore well under 1 GB just for sessions, which is fine.

What gets you is ulimit -n and conntrack. Set the broker user’s open file limit to at least the device count plus a generous buffer. On the broker host, raise net.netfilter.nf_conntrack_max if you’re behind a stateful firewall, and watch nf_conntrack: table full, dropping packet like a hawk in syslog. I’ve debugged enough “broker is down” incidents that were actually conntrack table exhaustion.

Persistent Sessions

If your devices need MQTT 5 session expiry longer than the default 2 hours, the broker has to store session state across reconnects. EMQX 5.7 has a built-in persistent session store backed by Mria (its Erlang Raft layer). HiveMQ has a similar persistent session feature backed by RocksDB.

Persistent sessions are great for industrial because cellular gateways drop and reconnect constantly. But they’re expensive. Each session with pending QoS 1 messages keeps those messages on disk until acked or expired. A fleet with 50k devices and even 100 inflight messages each is 5M durable rows the broker has to manage.

My rule of thumb. Use Session Expiry of one or two hours for cellular fleets, longer only for devices that genuinely need it. Set per-session inflight max conservatively, say 32 or 64. Use Message Expiry on publishes so the queue self-cleans.

Retained Messages

Retained messages are the silent killer. Every time a publisher uses retain=true, the broker stores a copy keyed by topic. The store is queried whenever a new subscriber matches that topic. With thousands of topics and tens of thousands of subscribers, a poorly designed retained policy can spend more CPU on retained delivery than on live messages.

EMQX 5.7 has a dedicated retained backend you can point at RocksDB. HiveMQ uses an internal store you can tune. In both cases the answer is the same. Audit your retained topics ruthlessly. Retain last-known-value on a small set of topics, not on raw telemetry. If you must retain telemetry, set a retained TTL.

Cluster Topologies

For EMQX, the cluster is Mria, which is a Raft-based replication layer over Erlang’s Mnesia. The default is full mesh, which works to maybe 5–7 nodes before chatter becomes noticeable. Past that, use the core-replicant topology. A small set of core nodes handles writes, replicants handle reads and connections. This is documented and well-trodden.

For HiveMQ, the cluster is a discovery-driven group that uses Hazelcast under the hood. It scales horizontally with the operator on Kubernetes. The operational ergonomics here are excellent if you’re already on K8s. Sizing-wise, eight to twelve nodes per cluster is comfortable, beyond which you should ask whether you actually need a single logical cluster or two regional ones bridged.

Here’s a fragment of an EMQX 5.7 cluster.hocon for a six-node mixed setup:

cluster {
  name = emqx-prod
  discovery_strategy = static
  static.seeds = [
    "emqx@10.0.1.10","emqx@10.0.1.11","emqx@10.0.1.12"
  ]
  core_nodes = ["emqx@10.0.1.10","emqx@10.0.1.11","emqx@10.0.1.12"]
}
node.role = core
listeners.tcp.default {
  bind = "0.0.0.0:1883"
  max_connections = 200000
  acceptors = 16
}

The three core nodes hold the source of truth. Replicants join with node.role = replicant and handle client connections, scaling out without adding Raft load.

ACL and Authorization

Every publish and subscribe gets authorized. If your ACL backend is a remote HTTP service, every authz call is a network roundtrip. That’ll cap you long before CPU does. Use cached ACL with a sane TTL, or use a built-in backend like PostgreSQL with prepared statements, or a file backend if your rules are static enough.

EMQX has a multi-backend ACL chain. HiveMQ has its security extension. In both, the recipe is the same. Cache aggressively. Authorize on connect for connect-level rules, then trust the session unless tokens are short-lived.

Bridging to Kafka

The MQTT broker should not be your durable store. EMQX 5.7 has a Kafka producer bridge built in. HiveMQ has a Kafka Extension. Both let you forward topics to Kafka with at-least-once semantics. Configure batch size and linger to match your Kafka cluster. Don’t run the broker as a Kafka producer at full speed without monitoring lag, because back-pressure from Kafka can stall publishes on the broker.

A minimal EMQX 5.7 bridge config:

bridges.kafka_producer.tele_to_kafka {
  bootstrap_hosts = "kafka-1:9092,kafka-2:9092,kafka-3:9092"
  kafka {
    topic = "iiot.telemetry"
    message {
      key = "${topic}"
      value = "${payload}"
    }
    buffer { mode = memory, per_partition_limit = 256MB }
  }
}

Common Pitfalls

  • One enormous wildcard subscription. A consumer subscribing to # looks innocent and will pin a CPU core. Always make subscribers as specific as possible.
  • Retain on every publish. The default in some SDKs is unfortunately easy to leave on. Audit and lock it down.
  • Single broker behind an L4 load balancer with sticky sessions. You’re not load balancing, you’re just adding a hop. Either go full cluster or accept a single broker with HA failover.
  • Forgetting TLS termination cost. TLS 1.3 with modern ciphers is cheap but not free. Hardware-accelerated NICs help. Terminate at the broker, not at an L7 proxy, to keep the connection identity intact.
  • Mixing rule engine logic with high-volume topics. Every rule evaluation costs. Move heavy logic out to a stream processor.

Wrapping Up

Brokers are remarkably capable in 2024. EMQX 5.7 and HiveMQ 4.28 will both serve a large industrial fleet on a few well-tuned nodes. The art is in session policy, retained discipline, ACL caching, and getting messages off the broker into a durable log as soon as you can.

If you haven’t read the earlier post on MQTT 5 features for industrial, the shared subscription and session expiry sections there pair directly with what’s above. For the canonical EMQX scaling reference, the EMQX clustering documentation is the place to start.

Boring broker, exciting plant. That’s the goal.