Edge AI Hardware in April 2025, Jetson, Coral, and Raspberry Pi 5 AI Hat
TL;DR — In April 2025 the edge AI hardware story finally simplified. Jetson Orin Nano Super gives you 67 INT8 TOPS for serious vision workloads, Coral Edge TPU is still the cheapest sustained 4 TOPS you can solder onto anything, and the Raspberry Pi 5 AI Hat (Hailo-8L, 13 TOPS) eats most of what used to require a Jetson Nano. Pick by thermals, power budget, and the runtime you already use.
I’ve been deploying edge AI to industrial sites for a few years now, and the hardware menu has been a moving target. For most of 2023 and 2024, “edge AI” meant either a Jetson Nano running JetPack 4 (ancient TensorRT), a Coral USB stick that couldn’t load anything bigger than EfficientDet-Lite, or a Raspberry Pi 4 doing CPU inference badly. None of those are good answers anymore.
April 2025 is a better moment to pick. NVIDIA shipped the Orin Nano Super update last December, which doubled the bandwidth and bumped INT8 TOPS into useful territory. The Raspberry Pi 5 AI Hat with the Hailo-8L (13 TOPS) ships in volume now and runs on the Pi’s PCIe lane. Coral hasn’t seen new silicon, but the toolchain finally caught up to recent TensorFlow Lite versions. So this post is a practical tour of those three, written from the perspective of someone who actually has to make the buy decision and own the firmware.
If you’ve read my earlier note on self-hosting n8n for engineering teams, this is the hardware that does the heavy lifting before n8n ever sees the event. We’re at the bottom of the stack, where amps and thermals matter more than YAML.
1. The three contenders, honestly compared
Let’s get the spec table out of the way. Numbers are vendor-published as of April 2025, prices are USD MSRP.
+----------------------+------------------+------------------+----------------------+
| Board | Jetson Orin Nano | Coral Dev Board | RPi 5 + AI Hat |
| | Super 8GB | Mini | (Hailo-8L) |
+----------------------+------------------+------------------+----------------------+
| Peak INT8 TOPS | 67 | 4 | 13 |
| RAM | 8 GB LPDDR5 | 2 GB LPDDR4 | 4 or 8 GB LPDDR4X |
| Power (typical) | 15-25 W | 2-3 W | 8-12 W (combined) |
| Runtime | TensorRT, ONNX | TFLite (edgetpu) | HailoRT, ONNX |
| Price (April 2025) | $249 | $99 | ~$140 (Pi5 4GB + Hat)|
| Camera input | 2x MIPI CSI-2 | 1x MIPI CSI-2 | 2x MIPI CSI-2 |
+----------------------+------------------+------------------+----------------------+
The TOPS column is misleading on its own. 67 TOPS on the Orin Nano Super only matters if your model is large enough and your batch size is high enough to saturate it. For a YOLOv8n at 640x640 doing single-frame inference, the Hailo-8L will often match it within 30%, at a third of the power.
1.1 Where each one wins
Jetson Orin Nano Super wins when you need:
- Multi-stream video (4 or more 1080p30 streams decoded and inferenced together).
- A model bigger than 50M parameters that you don’t want to surgery onto a fixed-function accelerator.
- The full PyTorch and ONNX ecosystem at the edge with CUDA.
Coral Edge TPU wins when you need:
- Sub-3W sustained, battery-powered, or solar-powered deployment.
- A known-good quantized TFLite model you’ll never change.
- Tiny form factor (the M.2 A+E variant is the size of a fingernail).
Raspberry Pi 5 AI Hat wins when you need:
- A real Linux distro you can
apt installon without flashing a custom JetPack image. - One CSI camera and one accelerator on commodity hardware your interns can replace.
- 13 TOPS at the price of an Xbox controller.
I run all three in production for different reasons. The Orin Nano Super handles the camera-dense sites where four IP cameras feed into a single board. The Hailo-8L on a Pi 5 runs single-camera quality inspection cells. Coral runs in the battery-powered remote sensors that ship to job sites and need to last a week.
2. Setting up a Jetson Orin Nano Super
The Orin Nano Super isn’t a new SKU. It’s a firmware update for the existing Orin Nano 8GB that NVIDIA released in December 2024. If you bought one before, you can flash it.
2.1 Flashing JetPack 6.1
JetPack 6.1 is the April 2025 baseline. It ships Ubuntu 22.04, CUDA 12.6, TensorRT 10.3, and the “Super” power mode.
# On an x86 host with sdkmanager 2.2 or NVIDIA SDK Manager 2.2
# Connect Orin Nano in recovery mode (jumper FC REC to GND, power on)
sdkmanager --cli install \
--logintype devzone \
--product Jetson \
--version 6.1 \
--targetos Linux \
--target JETSON_ORIN_NANO_TARGETS \
--flash all
Once it boots, switch into MAXN_SUPER mode. This is what unlocks the 67 TOPS figure.
# On the Jetson, as the user you created during first boot
sudo nvpmodel -m 2 # MAXN_SUPER
sudo jetson_clocks # lock clocks high
# Verify
nvpmodel -q
# NV Power Mode: MAXN_SUPER
The Super mode pulls about 25W under load. If you’re on a 5V/4A barrel, you’ll brown out. Use the 19V barrel jack and a real PSU.
2.2 Running a TensorRT inference
I’ll skip the toy MNIST examples. Here’s the real shape of a TensorRT inference loop in Python, using the tensorrt 10.3 API.
# infer_trt.py — single-image inference against an INT8 TensorRT engine
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def load_engine(path):
with open(path, "rb") as f, trt.Runtime(TRT_LOGGER) as rt:
return rt.deserialize_cuda_engine(f.read())
class Inferencer:
def __init__(self, engine_path):
self.engine = load_engine(engine_path)
self.ctx = self.engine.create_execution_context()
self.stream = cuda.Stream()
# Assume single input, single output for clarity
self.input_name = self.engine.get_tensor_name(0)
self.output_name = self.engine.get_tensor_name(1)
self.in_shape = self.engine.get_tensor_shape(self.input_name)
self.out_shape = self.engine.get_tensor_shape(self.output_name)
self.d_in = cuda.mem_alloc(int(np.prod(self.in_shape) * 4))
self.d_out = cuda.mem_alloc(int(np.prod(self.out_shape) * 4))
self.ctx.set_tensor_address(self.input_name, int(self.d_in))
self.ctx.set_tensor_address(self.output_name, int(self.d_out))
def __call__(self, img_chw_float32):
h_in = np.ascontiguousarray(img_chw_float32, dtype=np.float32)
h_out = np.empty(self.out_shape, dtype=np.float32)
cuda.memcpy_htod_async(self.d_in, h_in, self.stream)
self.ctx.execute_async_v3(self.stream.handle)
cuda.memcpy_dtoh_async(h_out, self.d_out, self.stream)
self.stream.synchronize()
return h_out
if __name__ == "__main__":
infer = Inferencer("yolov8n_int8.engine")
dummy = np.random.rand(1, 3, 640, 640).astype(np.float32)
out = infer(dummy)
print("output shape:", out.shape)
On the Orin Nano Super, this loop pushes about 165 FPS for YOLOv8n at 640x640. CPU stays under 30%.
3. Getting Coral Edge TPU usable in 2025
Coral has been in maintenance mode for two years. The pyhsical chip is fine, but the official Python wheels are stuck. The community fork libedgetpu at release 16 (April 2025) is what actually works on a modern Debian.
3.1 Installation on Raspberry Pi OS Bookworm
# Add Google's repo
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" \
| sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt update
sudo apt install -y libedgetpu1-std python3-pycoral
# Smoke test
python3 -c "from pycoral.utils.edgetpu import list_edge_tpus; print(list_edge_tpus())"
You should see something like [{'type': 'usb', 'path': '/sys/bus/usb/devices/2-1'}].
3.2 The smallest useful TFLite Edge TPU loop
# coral_classify.py
from pycoral.utils.edgetpu import make_interpreter
from pycoral.adapters import common, classify
from PIL import Image
import time
interpreter = make_interpreter("mobilenet_v2_1.0_224_quant_edgetpu.tflite")
interpreter.allocate_tensors()
img = Image.open("cat.jpg").convert("RGB").resize(common.input_size(interpreter))
common.set_input(interpreter, img)
# Warmup
interpreter.invoke()
start = time.perf_counter()
for _ in range(1000):
interpreter.invoke()
elapsed = time.perf_counter() - start
print(f"{1000/elapsed:.1f} FPS")
classes = classify.get_classes(interpreter, top_k=3)
for c in classes:
print(c.id, c.score)
On a Pi 5 with Coral USB 3.0, that loop hits about 410 FPS. The USB bus, not the TPU, is the bottleneck. Use the M.2 variant if you care.
4. Raspberry Pi 5 AI Hat, the dark horse
The AI Hat (officially “Raspberry Pi AI HAT+”) attaches via the Pi 5’s PCIe Gen 3 x1 lane and exposes a Hailo-8L (13 TOPS) or Hailo-8 (26 TOPS) accelerator. The 8L variant is what most people buy, and it’s what I’ll cover here.
4.1 Driver and runtime install
# Enable PCIe Gen 3 (default is Gen 2, AI Hat needs Gen 3 for full bandwidth)
sudo bash -c 'cat >> /boot/firmware/config.txt' <<EOF
dtparam=pciex1_gen=3
EOF
sudo reboot
# Install HailoRT (April 2025 baseline is 4.20)
sudo apt install -y hailo-all
# Smoke test
hailortcli scan
# Expected: "Hailo-8L found at <pci_address>"
4.2 Running a Hailo-compiled model
Hailo’s toolchain compiles ONNX or TFLite into a .hef (Hailo Executable Format) file. You either compile yourself with the Dataflow Compiler (which is a free download but requires signup) or grab a pre-compiled model from the Hailo Model Zoo.
# hailo_infer.py — single-frame inference with HailoRT 4.20
from hailo_platform import (HEF, VDevice, ConfigureParams, HailoStreamInterface,
InputVStreamParams, OutputVStreamParams, FormatType)
import numpy as np
hef = HEF("yolov8n.hef")
with VDevice() as device:
config = ConfigureParams.create_from_hef(hef, interface=HailoStreamInterface.PCIe)
network_groups = device.configure(hef, config)
ng = network_groups[0]
in_params = InputVStreamParams.make(ng, format_type=FormatType.UINT8)
out_params = OutputVStreamParams.make(ng, format_type=FormatType.FLOAT32)
with ng.activate():
from hailo_platform import InferVStreams
with InferVStreams(ng, in_params, out_params) as pipeline:
frame = np.random.randint(0, 255, (1, 640, 640, 3), dtype=np.uint8)
results = pipeline.infer({list(in_params.keys())[0]: frame})
for name, arr in results.items():
print(name, arr.shape)
Realistic throughput on the Hailo-8L for YOLOv8n at 640x640 sits around 105 FPS, with the Pi 5 CPU at about 18%. That’s the headline: the Pi handles capture, decode, and post-processing while the Hat does the heavy lifting.
5. Picking the right board for the job
I’ve found a decision tree that’s been right more often than wrong.
+------------------------+
| How many camera streams|
| per node? |
+-----------+------------+
|
+-------------+--------------+
| 1 stream | 2+ streams |
v v v
+------------+ +-----------+ +-----------+
| Power < 3W?| | Power<8W? | | Need GPU? |
+-----+------+ +-----+-----+ +-----+-----+
| | |
+-----+-----+ +----+----+ |
| yes | no | | yes | no| v
v v v v v v +---------+
Coral RPi5 RPi5 RPi5+Hat | Jetson |
+Hailo +Hat | Orin |
| Nano |
| Super |
+---------+
The two questions that actually matter are camera count and power budget. Everything else is downstream.
6. Common Pitfalls
Pitfall 1, treating TOPS as a benchmark
TOPS is a peak number measured under ideal sparsity and batch conditions. Real model throughput is typically 20-40% of peak. Always benchmark your specific model, your specific input size, your specific batch. I’ve seen 67 TOPS underperform 13 TOPS because the bigger chip was bandwidth-starved on small models.
Pitfall 2, ignoring thermal throttling
The Orin Nano Super in MAXN_SUPER will throttle at 90C. The reference heatsink barely keeps up at room temperature. For sites above 35C ambient, add a fan or you’ll watch your FPS halve mid-afternoon. The Pi 5 AI Hat also gets hot, and the official active cooler is non-negotiable.
Pitfall 3, USB Coral on USB 2.0 ports
The USB Coral negotiates USB 3.0 by default. If your host only has USB 2.0, it falls back silently and runs at about 30% speed. Check with lsusb -t. If you see 1.5G or lower, you’re on USB 2.0.
Pitfall 4, JetPack version skew with PyTorch wheels
The PyTorch wheels NVIDIA ships are pinned to specific JetPack versions. A wheel built for JetPack 5.x will not load on JetPack 6.1, and the error message is misleading. Always pull from the JetPack 6.1 index at developer.download.nvidia.com/compute/redist/jp/v61/.
7. Troubleshooting
hailortcli scan returns no devices
Three likely causes. First, PCIe Gen 3 isn’t enabled in config.txt; the Hat boots but can’t enumerate. Second, the ribbon cable is reversed (yes, this happens). Third, you’re on Raspberry Pi OS Bullseye, which lacks the right kernel modules. Upgrade to Bookworm.
Orin Nano boots but nvpmodel -m 2 fails with “Mode 2 not supported”
You’re on JetPack 5.x. MAXN_SUPER is only available on JetPack 6.1+. Either flash 6.1 or accept the lower power modes.
Coral inference returns garbage outputs
Almost always a model that wasn’t compiled for the Edge TPU. Run edgetpu_compiler --show_operations yourmodel.tflite and confirm the output line “Number of operations that will run on Edge TPU” is greater than zero. If it’s zero, the model has unsupported ops and is running on CPU with wrong tensor layouts.
8. Wrapping Up
The edge AI hardware market in April 2025 is no longer a graveyard of half-supported boards. The Orin Nano Super is a real upgrade, the Hailo-8L on the Pi 5 is the value pick of the year, and Coral is still the right answer for ultra-low-power. Pick by power budget and camera count, benchmark your actual model, and don’t trust the TOPS number.
Next post in this series tackles real-time telemetry processing in Go 1.24, the layer that consumes everything these boards emit. The hardware is only half the story.
For the official toolchains, NVIDIA’s JetPack documentation is the canonical reference. The Hailo developer docs are at hailo.ai/developer-zone.