background-shape
Sensor Data Schemas, JSON, Protobuf, CBOR
August 10, 2022 · 5 min read · by Muhammad Amal programming

TL;DR — JSON for greenfield IIoT where ergonomics matter and bandwidth doesn’t. Protobuf when schemas need to evolve safely. CBOR for cellular / LoRa where bytes matter. Sparkplug B for OT-specific MQTT payloads (covered later). Don’t ship “vendor-specific binary” if you can help it.

After Mosquitto setup, the payload itself. MQTT doesn’t care what bytes you publish; your applications must agree on a format. Three contenders in 2022 IIoT space.

JSON — the easy default

{
  "ts": "2022-08-10T09:14:33Z",
  "device": "press-42",
  "temperature": 72.5,
  "pressure": 1.03,
  "vibration_rms": 0.21
}

Pros:

  • Human-readable; eyeballs can debug
  • Tooling everywhere (curl, jq, scripting languages)
  • Schema-less; add fields without breaking subscribers (mostly)
  • Browser-friendly (WebSocket dashboards)

Cons:

  • Big. ~120 bytes for the above; binary alternatives are 30-40 bytes
  • Slow parse on constrained devices
  • No native binary types (timestamps are strings; numbers are floating-point JSON)
  • No enforced schema; subscribers handle malformed data gracefully or crash

For project bandwidth ~1MB/min and ergonomic priority: JSON. Most “we’ll figure out perf later” choices.

Protobuf — schema discipline

syntax = "proto3";

message SensorReading {
  google.protobuf.Timestamp ts = 1;
  string device = 2;
  float temperature = 3;
  float pressure = 4;
  float vibration_rms = 5;
}

The wire format is binary, ~38 bytes for the same data. Producer + consumer compile from .proto to generate code.

Pros:

  • Compact (3-4× smaller than JSON typically)
  • Strict schema; breaks at compile time on shape mismatch
  • Backward/forward compatibility rules built in (add fields safely)
  • Cross-language code generation

Cons:

  • Need to ship .proto files / codegen pipeline
  • Not human-readable on the wire
  • Schema evolution requires discipline (don’t change tag numbers)
  • More setup than JSON

For projects with multiple consumers, frequent schema changes, multi-language clients: Protobuf is worth it.

CBOR — JSON-shaped, binary-sized

A4                                # map(4)
   62 74 73                       # text(2) "ts"
   C1 1A 62 F3 AE 89              # tag(1) epoch-seconds 1660117833
   66 64 65 76 69 63 65           # text(6) "device"
   68 70 72 65 73 73 2D 34 32     # text(8) "press-42"
   ...

Encoded representation of the same JSON object. ~50 bytes; faster to parse than JSON; preserves the dynamic JSON-like structure without ahead-of-time schema.

Pros:

  • Binary; smaller and faster than JSON
  • Self-describing (like JSON); subscribers don’t need a schema
  • IETF standard (RFC 8949)
  • Used by Matter (Apple/Google smart-home), webauthn, vaccine credentials — credible adoption

Cons:

  • Less tooling than JSON or Protobuf
  • Still not as compact as Protobuf
  • Schema-less means same downsides as JSON for evolution

For LoRa, cellular, or any bandwidth-constrained edge: CBOR. Saves cost vs JSON, less ceremony than Protobuf.

Size comparison

For the example reading (5 fields, ASCII device name):

Format Bytes
JSON 120
CBOR 50
Protobuf 38
MessagePack 55
Sparkplug B 45

For 1000 messages/sec sustained, JSON is ~10 MB/min; Protobuf is ~3 MB/min. Cellular IIoT charges by the megabyte; the savings are real.

For most factory deployments over wired/WiFi networks, the bandwidth difference is irrelevant. Use the most maintainable format (JSON or Protobuf).

When each wins

Use JSON when:

  • Greenfield project, low device count, low bandwidth pressure
  • Multiple ad-hoc consumers (dashboards, debugging tools, scripts)
  • Schema changes often or is exploratory
  • WebSocket clients in the browser

Use Protobuf when:

  • Schema is stable enough to maintain
  • Multiple consuming services in different languages
  • You’re already using gRPC elsewhere
  • Bandwidth or parse-time matters

Use CBOR when:

  • Bandwidth-constrained (cellular, LoRa, satellite)
  • JSON’s flexibility is needed (dynamic shapes)
  • Edge device has limited CPU for parsing

Use Sparkplug B when:

  • Industrial OT specifically; you have factory engineers who expect it
  • Covered separately in the Sparkplug post

Versioning

Whatever format you pick, plan for evolution from day one.

For JSON, embed a version:

{
  "v": 1,
  "ts": "2022-08-10T09:14:33Z",
  ...
}

Subscribers check v and adapt. Bump on breaking changes; add fields freely on additive changes.

For Protobuf:

  • Never change a field’s tag number
  • Never reuse a deleted field’s tag number
  • Use optional for fields that might not be present
  • Bump the .proto file version (v1, v2) for major restructures

For CBOR: same as JSON — embed a version field.

A pragmatic choice

For our 120-sensor project: JSON. Bandwidth not an issue, ergonomics for engineering team’s debugging matters, multiple consumers (dashboard, alerts, analytics) want easy parsing.

For a different project I’m advising on (~5000 cellular-connected devices): CBOR. Cellular costs add up; the byte savings justify CBOR over JSON.

For a third (real-time control loops, microsecond budgets): Protobuf. Parse time matters; schema is stable; codegen pipeline is acceptable cost.

Three projects, three formats. No universal answer.

Common Pitfalls

“Save bytes” optimization with no measurement. If you can’t see the bandwidth cost on a chart, you don’t need to optimize the format.

Mixing formats on the same topic. Subscribers confused. Pick one per topic; document.

Schema-less JSON with no version field. Three months in, two versions of payload coexist; subscribers crash on the wrong one.

Protobuf without schema registry. Code generation drift between producer and consumer; messages parse wrong.

CBOR for things that don’t need it. Extra parsing complexity for no bandwidth win.

Custom binary formats. “We rolled our own, it’s 12 bytes.” Three months later, nobody remembers the format. Use a standard.

Wrapping Up

Pick by bandwidth + tooling + change frequency. JSON until you have a reason; Protobuf for schema discipline; CBOR for tight bandwidth. Friday: reading sensors with Go on the edge.