background-shape
Mqtt article cover illustration on a gradient background
April 21, 2026 · 9 min read · by Muhammad Amal programming
Advertisement

TL;DR — QoS is a contract, not a reliability dial / QoS 2 is rarely the right answer for sensors / Durable sessions plus a tuned RocksDB write path give you reliable delivery without paying the QoS 2 tax.

There’s a recurring mistake I see in MQTT deployments: someone reads that QoS 2 means “exactly once” and sets every publisher to QoS 2 because more reliability sounds better. Six months later the broker is doing four times the round trips it needs to, the disk is the bottleneck, and delivery is less reliable because the system can’t keep up under load.

QoS is not a quality knob you turn up. It’s a contract that describes how many times a message may be delivered and how many acknowledgement round trips that costs. Reliability — the message actually arriving and surviving a broker restart — comes from a different mechanism entirely: persistence. Conflating the two is the root cause of most “MQTT is dropping my data” tickets.

Advertisement

This post separates the two cleanly. We’ll cover when each QoS level is correct for sensor telemetry, how MQTT 5.0 durable sessions actually store inflight messages, and how to tune the RocksDB write path under EMQX 5.8 so persistence doesn’t become the bottleneck. It builds on the MQTT cluster optimization piece — clustering gives you fault tolerance, but only correct QoS and persistence settings turn that into reliable delivery.

QoS Levels, What They Actually Cost

MQTT defines three QoS levels, and the difference is purely the handshake.

  • QoS 0 — fire and forget. One PUBLISH, no acknowledgement. The message is delivered at most once. If the TCP buffer drops it or the subscriber is offline, it’s gone.
  • QoS 1 — at least once. PUBLISH, then PUBACK. The publisher retransmits until it sees a PUBACK, so a message can arrive more than once. The subscriber must be idempotent.
  • QoS 2 — exactly once. A four-packet handshake: PUBLISH, PUBREC, PUBREL, PUBCOMP. The broker tracks message state across all four to guarantee single delivery.

The cost difference is stark. QoS 0 is one packet. QoS 1 is two. QoS 2 is four, plus per-message state the broker must persist to survive a restart mid-handshake. For a fleet pushing thousands of messages per second, QoS 2 roughly doubles broker write load versus QoS 1 for a guarantee most sensor workloads don’t need.

Here’s the decision rule I apply:

Message typeQoSReasoning
Routine periodic reading0Next reading is seconds away; a stale sample is worthless
Threshold-crossing alert1Must arrive; consumer dedupes on event ID
Actuator command / config change2Double-applying a command is unsafe
Aggregated batch upload1Must arrive; batch carries a sequence number for dedup

The honest summary: QoS 0 for the firehose, QoS 1 for things that matter, QoS 2 only when duplicate delivery itself causes harm. Most sensor systems never legitimately need QoS 2.

QoS 1 Plus Idempotency Beats QoS 2

If QoS 1 can deliver twice, how do you get exactly-once behavior without QoS 2? You make the consumer idempotent. This is the standard pattern in high-throughput systems, and it’s strictly better than QoS 2 because the dedup happens at the consumer, not as broker-side per-message state.

Every meaningful message carries a stable event ID. MQTT 5.0 makes this clean with user properties:

# paho-mqtt 2.1.0 — publishing a threshold alert with a dedup key
import uuid, json, paho.mqtt.client as mqtt
from paho.mqtt.properties import Properties
from paho.mqtt.packettypes import PacketTypes

props = Properties(PacketTypes.PUBLISH)
event_id = str(uuid.uuid4())
props.UserProperty = [("event-id", event_id), ("schema", "alert.v2")]
props.MessageExpiryInterval = 3600  # stale alerts self-destruct after 1h
props.ContentType = "application/json"

payload = json.dumps({
    "device_id": "dev-0a3f",
    "metric": "pm25",
    "value": 187.4,
    "threshold": 150.0,
    "ts": "2026-04-21T02:14:09Z",
})

client.publish(
    "env/jawa-barat/site-12/pm25/dev-0a3f/alert",
    payload=payload, qos=1, retain=False, properties=props,
)

The consumer keeps a short-lived seen-set keyed on event-id and drops repeats. MessageExpiryInterval is an underrated MQTT 5.0 feature: a threshold alert that’s been stuck in a queue for an hour is no longer actionable, so the broker discards it instead of delivering stale noise. Set it deliberately per message type.

Durable Sessions, Where Reliability Actually Lives

QoS governs the handshake. Persistence governs survival. In MQTT 5.0 with EMQX 5.8, a durable session stores subscriptions and inflight QoS 1/2 messages on disk, so they survive both a client disconnect and a broker node restart.

The client opts in through the CONNECT packet: clean_start=False plus a non-zero SessionExpiryInterval. The broker keeps the session for that interval after disconnect, queuing messages the whole time.

# emqx.conf — durable session storage
durable_sessions {
  enable = true
  batch_size = 1000
  idle_poll_interval = "10s"
  # How long a disconnected session is retained if the client
  # requests a longer expiry than this, this caps it.
  session_gc_interval = "10m"
  message_retention_period = "4h"
}

durable_storage {
  messages {
    backend = builtin_raft
    n_shards = 16
    replication_factor = 3
    # RocksDB-backed local storage per shard
    rocksdb {
      cache_size = "256MB"
      write_buffer_size = "64MB"
      max_open_files = 1024
    }
  }
}

Two things to understand. First, n_shards = 16 partitions the durable message store; more shards mean more write parallelism but more files and more memory. Sixteen is a good fit for a few-thousand-sensor fleet on three cores. Second, the durable store is the same one used for cluster session failover — a sensor whose broker node dies reconnects elsewhere and finds its queued QoS 1 alerts intact.

Tuning the RocksDB Write Path

EMQX 5.8’s durable storage is backed by RocksDB. Out of the box it’s tuned for general use, and for a write-heavy sensor ingest path you’ll want to tune it. The failure mode of an untuned write path is sneaky: everything works in testing, then under sustained load write stalls appear, latency spikes, and QoS 1 PUBACKs slow down enough to trigger publisher retransmits — which add more write load. A feedback loop into the floor.

The levers that matter:

durable_storage.messages.rocksdb {
  # Larger write buffer = fewer, bigger flushes to L0
  write_buffer_size = "128MB"
  max_write_buffer_number = 4
  min_write_buffer_number_to_merge = 2

  # Compaction: more background threads keep L0 from piling up
  max_background_jobs = 6

  # Stall thresholds — raise so brief bursts don't stall writes
  level0_slowdown_writes_trigger = 20
  level0_stop_writes_trigger = 36

  # Block cache for read-back during session resume
  cache_size = "512MB"

  # WAL: the durability boundary for an individual write
  wal_recycle = true
  bytes_per_sync = "1MB"
}

The single most important pair is level0_slowdown_writes_trigger and level0_stop_writes_trigger. RocksDB writes land in L0 first; compaction merges L0 into L1. If writes outrun compaction, L0 files pile up, and at the slowdown trigger RocksDB artificially throttles writes — your QoS 1 latency just spiked. At the stop trigger it halts writes entirely. Raising these with enough max_background_jobs to keep compaction ahead is the difference between a smooth burst and a stall cascade.

write_buffer_size is the second lever. Larger buffers mean each flush writes a bigger, more sequential SST file, which is friendlier to disk and to compaction. The trade is memory and a longer recovery replay if the node crashes before a flush. On a node with 32 GB of RAM, 128 MB buffers with four of them is a comfortable setting.

Watch the write path directly:

# Durable storage metrics via the Prometheus endpoint
curl -s http://localhost:18083/api/v5/prometheus/stats \
  | grep -E 'emqx_ds_store|emqx_ds_egress|emqx_ds_buffer'

# RocksDB internal stats from the EMQX console
emqx ctl ds info

# Watch for write stalls — this counter should stay flat
emqx ctl ds info | grep -i stall

If the stall counter climbs, compaction is losing. Add background jobs, raise the L0 triggers, or move the data directory to faster storage. On NVMe this is rarely a problem; on a network-attached volume it’s almost guaranteed without tuning.

Retained Messages and Last Will

Two more MQTT features matter for sensor reliability. Retained messages let a newly subscribing consumer immediately get the last known value of a topic without waiting for the next sample — essential for a dashboard that starts cold. Set retain=True on the latest-value topic, not on the alert stream:

# Latest reading retained; dashboards get current state on subscribe
client.publish(
    "env/jawa-barat/site-12/pm25/dev-0a3f/latest",
    payload=payload, qos=1, retain=True,
)

Last Will and Testament lets the broker publish a message when a sensor disconnects ungracefully — a power loss, a cut network link. Configure it at connect time so a dead sensor announces itself:

will_props = Properties(PacketTypes.WILLMESSAGE)
will_props.MessageExpiryInterval = 600
client.will_set(
    "env/jawa-barat/site-12/status/dev-0a3f",
    payload=json.dumps({"device_id": "dev-0a3f", "state": "offline"}),
    qos=1, retain=True, properties=will_props,
)

Combine the two: a retained LWT means a dashboard that subscribes after a sensor died still sees it as offline, instead of showing a stale last reading as if it were live.

Common Pitfalls

  • QoS 2 everywhere. Doubles broker write load for a guarantee most sensor data doesn’t need. Use QoS 1 plus consumer idempotency.
  • QoS without persistence. QoS 1 only guarantees delivery while both ends are up. Without durable sessions, a broker restart still loses inflight messages.
  • Retaining the alert stream. A retained alert delivers a possibly stale event to every new subscriber. Retain latest-value topics only.
  • No MessageExpiryInterval. Without it, a message stuck in a queue for hours gets delivered as if fresh. Set an expiry that matches the message’s useful lifetime.
  • Ignoring RocksDB L0 triggers. Default triggers stall writes under sustained sensor load. Raise them and add compaction threads.
  • clean_start=True with a session expiry. Contradictory: the client throws away the session it just asked the broker to keep.

Troubleshooting

Symptom: QoS 1 publish latency climbs steadily under sustained load. Cause: RocksDB write stalls — L0 files outrunning compaction. Fix: Check emqx ctl ds info for stall counts; raise level0_slowdown_writes_trigger and max_background_jobs, or move data to NVMe.

Symptom: Messages lost after a broker restart despite QoS 1. Cause: Durable sessions disabled, so inflight messages were memory-only. Fix: Set durable_sessions.enable = true and ensure clients use clean_start=False with a non-zero SessionExpiryInterval.

Symptom: Consumers process the same alert twice. Cause: Expected with QoS 1 — at-least-once allows redelivery. Fix: Make the consumer idempotent; dedup on the event-id user property.

Symptom: Dashboard shows a stale reading for an offline sensor. Cause: No LWT, or LWT not retained. Fix: Configure a retained Last Will on the status topic so disconnect publishes an offline state.

Symptom: Broker memory grows unbounded over days. Cause: Durable sessions accumulating with no expiry cap; dead clients never garbage-collected. Fix: Set session_gc_interval and a sane message_retention_period; cap the client-requested expiry.

What’s Next

QoS picks the delivery contract; persistence makes that contract survive failure; a tuned RocksDB write path keeps persistence from becoming the bottleneck. Get all three right and “MQTT is dropping data” stops being a ticket you ever file. With the broker side solid, the next step is feeding real inference results into it — see connecting edge vision to an MQTT backbone .

The MQTT 5.0 specification is the canonical reference for QoS semantics, session expiry, and message expiry properties.

Advertisement