background-shape
Choosing an MQTT 5.0 Broker for Industrial IoT in 2023
August 2, 2023 · 7 min read · by Muhammad Amal programming

TL;DR — Mosquitto 2.0.x is still the right call for single-site plants under ~50k connected sensors / EMQX 5.1 wins if you need clustering, rule engine, and shared subscriptions out of the box / HiveMQ 4 is the safer commercial choice when you’re already paying for an enterprise SLA.

Every six months a customer asks me which broker they should standardize on, and every six months I have to remind them the answer depends less on the broker and more on what they’re modeling. An MQTT broker is a router with persistence — picking one without knowing your topic cardinality, message retention requirements, and disaster-recovery story is like picking a database by GitHub stars.

This post is the short version of a decision matrix I’ve been refining since I shipped my first MQTT 3.1.1 deployment in 2017. We’ll stick to the three brokers I actually keep deploying in 2023: Eclipse Mosquitto 2.0.x, EMQX 5.1, and HiveMQ 4. Everything assumes MQTT 5.0, because if you’re greenfielding IIoT in 2023 and not using MQTT 5, you’re paying for ergonomics with future migration cost.

I’ll skip benchmark theater. Numbers from vendor blogs are useless without your topic tree, your QoS distribution, and your TLS handshake load. What matters is operational fit.

What Actually Changes Between Brokers

The MQTT spec is the easy part. Every conforming broker handles PUBLISH/SUBSCRIBE the same way. What differs is everything around it:

  • Authentication and authorization backends. Mosquitto ships with a plugin ABI; EMQX and HiveMQ ship with batteries-included LDAP, JWT, HTTP hook, and database auth.
  • Bridging. Mosquitto’s bridge config is simple but single-threaded per bridge. EMQX bridges to Kafka, Pulsar, and 30+ data sinks via the Rule Engine. HiveMQ has the Extension SDK.
  • Clustering. Mosquitto has no native cluster (the Pro fork does). EMQX clusters via Mria and Khepri. HiveMQ clusters via its own gossip protocol.
  • Shared subscriptions. All three implement $share/<group>/<topic> from MQTT 5, but only EMQX and HiveMQ give you the load-balancing knobs you’ll want.
  • Observability. EMQX has a dashboard and Prometheus endpoint. HiveMQ has a Control Center. Mosquitto has $SYS/ topics and you wire Prometheus yourself.

Look at that list and ask: how much of this would I rather buy than build?

Mosquitto 2.0.x: Boring, Small, Honest

Mosquitto is the broker I still recommend for sites with up to roughly 50k concurrent clients, modest message rates (under 30k msg/sec), and a single-process or active/passive failover topology. It’s a 2 MB binary, it has been audited, and you can read the entire source in an afternoon.

A working production config for a single-site SCADA gateway looks like this:

# /etc/mosquitto/mosquitto.conf — Mosquitto 2.0.18
listener 8883
protocol mqtt
cafile /etc/mosquitto/certs/ca.crt
certfile /etc/mosquitto/certs/server.crt
keyfile /etc/mosquitto/certs/server.key
require_certificate true
use_identity_as_username true

persistence true
persistence_location /var/lib/mosquitto/
autosave_interval 60

max_inflight_messages 200
max_queued_messages 10000
queue_qos0_messages false
message_size_limit 268435456

log_dest syslog
log_type error
log_type warning
log_type notice
sys_interval 10

per_listener_settings true
allow_anonymous false
acl_file /etc/mosquitto/acl

A few production gotchas the official Mosquitto docs cover but most blogs don’t: max_queued_messages is per-client, not global, and persistence is a periodic snapshot — not a write-ahead log. If the process is SIGKILL’d between snapshots you lose what’s queued for offline clients. For deeper config patterns the Mosquitto documentation is the authoritative reference.

Where Mosquitto starts hurting:

  • You need horizontal scale. The community version doesn’t cluster.
  • You want rule-engine routing (e.g. “republish factory/+/temp over 80 °C to a Kafka topic”). You’ll write a sidecar.
  • You need fan-out to multiple cold and hot stores. Again, sidecar.

EMQX 5.1: The Pragmatic Default for New Builds

EMQX 5.1 dropped earlier this year with Mria-based replication and a rewritten Rule Engine. For most new IIoT builds I’m starting in 2023, this is what I reach for first. The reason is simple: the rule engine eliminates an entire class of glue code.

Here’s a rule that mirrors high-priority temperature alerts from devices into both InfluxDB and Kafka, with one SQL-like statement:

-- EMQX 5.1 Rule Engine
SELECT
  payload.device_id AS device_id,
  payload.temp_c    AS temp_c,
  payload.ts        AS ts,
  topic             AS source_topic
FROM
  "factory/+/sensors/temp"
WHERE
  payload.temp_c > 80.0

You bind that rule to two actions in the dashboard — one InfluxDB 2.7 sink, one Kafka 3.5 producer — and you have a fan-out you’d otherwise have built with Telegraf and a custom consumer.

EMQX clustering is genuinely easy. A three-node cluster on Kubernetes via the EMQX Operator looks like:

# emqx-cluster.yaml — EMQX Operator 2.2, EMQX 5.1
apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
spec:
  image: emqx:5.1.5
  coreTemplate:
    spec:
      replicas: 3
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"
  replicantTemplate:
    spec:
      replicas: 2
  listenersServiceTemplate:
    spec:
      type: LoadBalancer

The split between core and replicant nodes is the part most engineers miss. Core nodes hold the routing metadata; replicants are stateless and where you push most of the connection load. For a plant with 200k sensors I’d run 3 cores and 6–10 replicants, and let the Kubernetes service handle TLS termination at the LB. See the EMQX 5.1 architecture docs for the replication guarantees.

EMQX’s downside is operational surface area. You’re now running Erlang, Mnesia/Mria, a dashboard, and a rule engine. When something goes wrong at 3am, “restart Mosquitto” is a much shorter rollback than “diagnose Mria split-brain.”

HiveMQ 4: When You Need to Buy the SLA

HiveMQ 4 is what I deploy when the customer has procurement authority but not 24/7 ops headcount. The Control Center is the most polished of the three, the extensions ecosystem is mature (Kafka, Google Pub/Sub, Snowflake, Distributed Tracing), and support response times have been the best of the three vendors I’ve worked with.

Three things that justify the license cost for industrial customers:

  1. Hot rolling upgrades. The cluster will run mixed versions during an upgrade. Mosquitto and community EMQX both want you to take maintenance windows.
  2. Per-client message expiry that actually behaves the way the MQTT 5 spec says.
  3. The Kafka extension. It’s the cleanest way I’ve found to project MQTT topics into Kafka topics with key extraction from the payload.

If you’re already standardized on HiveMQ for a previous deployment, don’t switch off it for new sites just to save list price. The operational consistency is worth more than the license delta.

A Decision Tree That Survives Contact With Reality

Single site, <50k devices, no cluster needed
  -> Mosquitto 2.0.x. Don't overthink it.

Multi-site, need rule engine + Kafka/Influx sinks, k8s shop
  -> EMQX 5.1 with the Operator.

Procurement allows commercial, ops team is thin
  -> HiveMQ 4 with the Kafka and Control Center extensions.

You're storing 90 days of payloads inside the broker
  -> You're wrong. Use a broker + time-series DB.

The last branch matters more than the first three. Brokers are not databases. Set retained-message hygiene aggressively, set message_expiry_interval on every publish, and offload anything past the working set into a time-series store you can actually query.

Common Pitfalls

  • TLS at the broker without session tickets. Every reconnect becomes a full handshake. On a 100k-device deployment that’s a CPU bonfire. Terminate TLS at a load balancer that supports session resumption (HAProxy, Envoy) or enable session tickets in the broker config.
  • Wildcard topic explosions. A subscriber on # will receive every message on the bus. I’ve seen this take down a Mosquitto instance in under a minute when a misconfigured dashboard joined the cluster. Use ACLs to forbid # for non-admin principals.
  • QoS 2 by default. QoS 2 is a four-step handshake. If your sensor publishes once per second and you don’t need exactly-once, QoS 1 is fine and roughly 2x cheaper. Reserve QoS 2 for command/control where duplication is genuinely harmful.
  • No Client ID discipline. Two devices with the same client ID will play ping-pong with each other forever. Make client IDs deterministic from a hardware identifier and reject duplicates at the auth layer.
  • Treating the broker as durable storage. Retained messages and offline message queues are convenience features, not a database. They’ll outgrow RAM and start to swap, and the next thing you notice is publish latency tripling.

Wrapping Up

The broker decision is rarely the bottleneck — your topic design, payload format, and downstream consumers are. Pick the broker that minimizes the operational footprint your team can actually carry, not the one with the prettiest benchmark chart. Next post I’ll get into the Mosquitto-specific tuning I’ve used to push single-node deployments past their default ceilings.