background-shape
Industrial IoT in 2024, A Reference Architecture That Scales
August 5, 2024 · 6 min read · by Muhammad Amal programming

TL;DR — Edge does protocol translation and buffering, the broker is your contract, the stream is your fan-out, and storage is split between hot time-series and warm columnar. Everything else is decoration.

Every IIoT project I’ve shipped in the last decade ends up at roughly the same shape once the dust settles. The vendor pitches differ. The marketecture diagrams differ. The way data actually flows, once you have ten thousand tags pulsing at 1 Hz and a control room that screams when a screen freezes, converges. This article is the version of that shape I’d recommend in August 2024, with concrete components and the reasoning behind each.

I’ll keep it grounded. No “platform of platforms” nonsense. We’re going to look at five layers, what each one owns, and where the seams need to be sharp so you can replace pieces later without ripping the whole thing apart. If you’ve read the Purdue model and felt it was a little too 1990s, this is what I use instead.

One opinion up front. The biggest architectural mistake I see in 2024 is putting business logic in the broker. The broker is plumbing. Treat it that way.

The Five Layers

There’s an edge layer running on gateways or industrial PCs. A messaging layer, almost always MQTT. A streaming layer, usually Kafka or Redpanda. A storage layer split between time-series and analytical. And an application layer on top, which is where dashboards, alerts and twins live.

The trick is the contract between each layer. Topics for messaging. Schemas for streaming. Hypertables and continuous aggregates for storage. Get those three right and the rest is hiring.

Edge

The edge is where physics meets software. You’re reading from PLCs over Modbus TCP, S7, EtherNet/IP, or you’re reading from a DCS via OPC UA. You normalize the tag namespace, you apply unit conversions, and you publish to MQTT. The gateway also buffers locally when the uplink fails, which it will.

A minimal edge worker in Python, using asyncua and aiomqtt, looks like this. Real production code has more error handling, but the shape is right:

import asyncio
from asyncua import Client
from aiomqtt import Client as Mqtt

OPC_URL = "opc.tcp://192.168.10.20:4840"
MQTT_HOST = "broker.plant.local"

async def main():
    async with Mqtt(MQTT_HOST, port=1883, client_id="gw-l3-01") as mqtt, \
               Client(url=OPC_URL) as opc:
        nodes = [opc.get_node("ns=2;s=Line1.Press1.Bar"),
                 opc.get_node("ns=2;s=Line1.Temp1.C")]
        while True:
            for n in nodes:
                val = await n.read_value()
                topic = f"plant/l3/line1/{n.nodeid.Identifier.split('.')[-1].lower()}"
                await mqtt.publish(topic, payload=f"{val}", qos=1, retain=False)
            await asyncio.sleep(1)

asyncio.run(main())

That snippet is intentionally boring. The edge should be boring. Push complexity north.

Messaging

MQTT is the right choice for north-south telemetry in 2024. The protocol is mature, the brokers are excellent, and almost every PLC, gateway and SCADA vendor speaks it. Use MQTT 5.0. Use shared subscriptions. Use topic aliases when you have devices on cellular links. I’ll write more about MQTT 5 features in a follow-up but for now, treat the broker as the place where producers and consumers don’t know about each other. That decoupling is what lets you change everything else.

Topic hygiene matters more than people admit. I use a four-or-five-level pattern, lowercase, no wildcards in publishes ever:

plant/<site>/<area>/<line>/<device>/<metric>

So plant/jkt1/packaging/line3/scale4/weight_kg. Predictable, sortable, easy to ACL.

Streaming

The broker doesn’t store. Kafka does. The bridge between MQTT and Kafka is where you do schema validation, deduplication, and fan-out. EMQX has a built-in data integration bridge for Kafka in 5.7. HiveMQ has the Kafka Extension. Both work. Pick whichever you’re already paying for.

Once data lands in Kafka, you have a durable log you can replay, branch into multiple consumers, and connect to anything downstream. This is non-negotiable for production. Without it you have no way to backfill a new analytics consumer without re-running history from your storage.

Storage

Hot path goes to a time-series database. TimescaleDB 2.16 if you want SQL and joins to relational data, InfluxDB 3.x if you want columnar performance for pure metrics. I lean Timescale for industrial because you’re joining telemetry against asset hierarchies, work orders, and shift schedules all day long. SQL wins there.

Warm path goes to object storage in Parquet, queried with DuckDB or whatever lakehouse you have. Don’t put six months of 1 Hz tags in your hot DB. Roll them up with continuous aggregates and ship the raw rows to S3.

Application

Dashboards, alerts, twins, ML. This is the layer everyone wants to talk about and the one that benefits most from the discipline below it. If the lower four layers are clean, your application layer is mostly Grafana plus a small services tier.

Sizing It Honestly

A common starter cluster for a single large plant, say 30k tags at 1 Hz with bursts to 10 Hz on critical loops:

  • Two MQTT brokers in a cluster, 8 vCPU and 16 GB each. EMQX 5.7 handles this with room to spare.
  • A three-node Kafka 3.7 cluster, 8 vCPU and 32 GB each, with 2 TB NVMe per node.
  • A Timescale 2.16 primary plus one read replica, 16 vCPU and 64 GB, 4 TB NVMe.
  • A small Kubernetes cluster for the bridge workers, edge sync, and dashboards.

That’s roughly USD 1.5–2.5k per month on a major cloud, less on a colo. It scales horizontally from there. The most common mistake in sizing is under-provisioning disk IO. Always go NVMe. Spinning disk and even SATA SSD will bite you when a retention window or a backfill kicks in.

Common Pitfalls

A short list, all of which I’ve seen in the wild this year.

  • Putting transformations in the broker. EMQX rules and HiveMQ extensions are powerful. Resist using them for anything beyond simple tag enrichment. Logic in the broker is invisible to your data team and hard to test.
  • Skipping the streaming layer. “We’ll just write from MQTT straight to the database.” Then you onboard a second consumer and discover you have to backfill from Postgres into Kafka anyway. Build it once.
  • Ignoring tag governance. If every gateway picks its own topic shape, you’ll spend year two writing transformation code instead of features. Publish a topic spec on day one.
  • Treating QoS 2 as default. It’s expensive and rarely justified. QoS 1 with idempotent consumers is the right answer for almost everything.
  • No buffering at the edge. Networks fail. Plants run two shifts on a bad WAN link. Buffer locally, replay in order.

Wrapping Up

This architecture is unglamorous on purpose. The wins come from clean seams between layers, not from any single fancy component. If you’re starting an IIoT program in 2024, copy this shape, write down your topic and schema contracts, and spend your creative energy on the application layer where it actually moves the business.

For deeper dives into the messaging layer, I’ll cover MQTT 5 features for industrial and scaling EMQX and HiveMQ in the next two posts. If you want the canonical reference for the Purdue model this architecture quietly replaces, the ISA-95 standard is worth a read on the ISA website.

Build boring infrastructure. Save your weird for the application layer.