background-shape
Securing the OT IT Boundary in Industrial Networks
August 21, 2024 · 6 min read · by Muhammad Amal programming

TL;DR — One-way data flows where possible, mTLS everywhere, a hardened bridge as the single crossing point, and full monitoring on both sides. The threat model isn’t APT, it’s accidents and lateral movement.

The OT/IT boundary is the most consequential security line in any industrial system. Get it wrong and either the plant is unsafe or the data is useless. Get it right and you can move on to features. In August 2024, the patterns are well understood, the tooling is mature, and there’s no excuse for shipping something fragile. This post is the playbook I use.

I want to set the threat model straight first. For most industrial deployments, the realistic threat is not a state actor. It’s an IT-side compromise drifting into OT through a flat network or a shared admin account, or a misconfiguration that turns a query into a write. Defense against those is mostly about segmentation, least privilege, and observability. The nation-state stuff is a different conversation and usually involves people with badges.

Two principles drive everything below. Data should flow OT to IT, not the other way around, except for narrowly defined and audited control paths. And the boundary should be a single, documented, monitored crossing, not a dozen ad-hoc tunnels.

Segmentation

The Purdue model gets some flak for being old, but the segmentation idea is right. Whatever you call your zones, you want at least three. A control zone (PLCs, DCS, HMIs) on its own VLAN. A bridge zone where the OPC UA bridge and MQTT broker live. An IT zone for analytics, dashboards, and the rest of the data platform.

Firewalls between zones have an explicit allowlist. Control zone to bridge zone allows OPC UA on the negotiated ports, with the bridge as the only allowed initiator. Bridge zone to IT zone allows MQTT and Kafka egress. IT to bridge is generally denied except for narrow administrative paths through a jump host with MFA.

For high-assurance setups, a data diode in the OT-to-IT direction is the gold standard for one-way flow. They’re real hardware, they work, and they’re expensive. For most deployments, a properly configured firewall with explicit deny on inbound from IT to OT is sufficient.

Authentication Everywhere

Inside the bridge zone, every link is mutually authenticated. OPC UA with SignAndEncrypt and certificates from your internal CA. MQTT over TLS with client certificates, ideally short-lived. Kafka with mTLS or SASL/SCRAM-SHA-512 at minimum.

Password-based auth on industrial gear is a fact of life for legacy stuff, but anything new should be cert-based. Rotate every certificate annually at minimum, more often for high-value paths. Have a documented rotation procedure that doesn’t require touching every PLC.

A minimum MQTT broker listener config for the boundary, EMQX 5.7:

listeners.ssl.boundary {
  bind = "0.0.0.0:8883"
  ssl_options {
    cacertfile = "/etc/emqx/certs/ca.pem"
    certfile = "/etc/emqx/certs/server.pem"
    keyfile = "/etc/emqx/certs/server-key.pem"
    verify = verify_peer
    fail_if_no_peer_cert = true
    versions = ["tlsv1.3"]
  }
  max_connections = 50000
}

verify_peer and fail_if_no_peer_cert are the two settings that turn this from “encrypted” into “authenticated”. Both must be on.

The Bridge as the Crossing

The OPC UA to MQTT/Kafka bridge is the single crossing point between OT and IT zones. That makes it both the most security-sensitive component and the easiest to monitor, because everything passes through it.

The bridge should:

  • Run on a dedicated host or VM, not shared with other services.
  • Have only the necessary outbound connections, enforced by host firewall.
  • Use a service account with read-only OPC UA permissions where possible.
  • Drop privileges after binding ports.
  • Log every connection event, every authn/authz failure, every mapping change.

The mapping file is also a security artifact. A change to the mapping can quietly expose tags that weren’t previously visible to IT. Review mapping changes through your normal code review process and log the active mapping version with every published message.

Authorization

ACLs on the MQTT broker should be tight. Each device or bridge instance gets publish rights to its own topic subtree and nothing else. Each consumer gets subscribe rights to exactly what it needs. No wildcards at the top level. No “everyone can publish to anything”.

A simple EMQX ACL file pattern:

{allow, {clientid, "bridge-jkt1-area1"}, publish, ["plant/jkt1/area1/#"]}.
{allow, {clientid, "ingest-1"}, subscribe, ["$share/ingest/plant/+/+/+/+/+/+"]}.
{deny, all, publish, ["#"]}.
{deny, all, subscribe, ["#"]}.

The two deny rules at the end are the safety net. Explicit allow, default deny.

For Kafka, ACLs per topic per principal, set with the standard kafka-acls.sh. Producers can’t read. Consumers can’t write. Admin operations are tightly scoped to a small ops group.

Monitoring

You cannot secure what you don’t see. The boundary needs three kinds of telemetry feeding a SIEM or at least an alerted log store.

Network logs from the firewall between zones. Connection attempts, denied flows, throughput by direction. Spikes and shape changes are interesting.

Application logs from the bridge, broker, and Kafka. Authn failures, ACL denials, certificate expiry warnings, configuration changes. All of these are alertable.

Behavioral metrics. Message rates per topic, per device, per direction. A device that suddenly publishes 100x its usual rate, or a consumer that subscribes to wildcards it didn’t yesterday, is worth a look.

I’ve caught more real issues with rate anomalies than with signature-based detection. The boring stuff finds the boring problems, which is most of them.

Patching

OT teams hate patching because patches break things. IT teams hate not patching because of CVEs. The reconciliation is to patch the IT-facing components of the boundary (broker, bridge, Kafka) on a normal IT cadence, and leave the OT-side gear on whatever conservative cadence the vendor recommends.

This requires the bridge to be tolerant of restarts on its IT-facing side. A graceful shutdown, persistent state for inflight messages, and reconnect with backoff. Most modern brokers and bridges support this. Test it.

For the broker, EMQX 5.7 and HiveMQ 4.28 both support rolling restarts in a cluster. Use them. Don’t patch by taking the whole cluster down.

A Note on Remote Access

Vendor remote access is the most common breach vector in industrial environments. Always-on VPNs from a vendor laptop to a PLC are exactly the wrong shape. The right shape is a request-approved, time-limited, recorded session through a jump host with MFA. Tools like Tailscale ACLs, Cloudflare Access for SSH, or commercial OT-specific remote access products like Claroty’s offering work fine.

The rule is, no inbound persistent connection from outside. Every remote session is initiated by a request and approved by a human.

Common Pitfalls

  • Flat networks “just for now”. Then never fixed. Segment on day one.
  • Self-signed certificates that never rotate. They expire on a Sunday. Plan rotation now.
  • Shared admin credentials. Use per-person accounts with MFA, even for OT consoles where possible.
  • Logs collected nowhere. If logs only live on the device, they may as well not exist. Ship them off the device immediately.
  • Treating MQTT as “internal” and skipping TLS. The bridge zone is still a boundary. TLS the boundary.

Wrapping Up

OT/IT boundary security in 2024 is mostly about discipline, not exotic tooling. Segment properly. Authenticate everything. Make the bridge the single crossing. Watch it carefully. Patch the IT side on a normal cadence and don’t patch the OT side casually.

For more on the components this sits on, see the earlier OPC UA to MQTT/Kafka bridging and IIoT reference architecture posts. The ISA/IEC 62443 series is the canonical reference for industrial cybersecurity if you need a deeper framework.

Boring boundary, safe plant. Make it boring on purpose.