Embedded Rust in 2024, Embassy and the no_std Stack

Embedded Rust in 2024, Embassy and the no_std Stack

March 13, 2024 · 7 min read · by Muhammad Amal programming

TL;DR — Embedded Rust in 2024 finally feels like a first-class platform thanks to embassy 0.5 and stable async on no_std. You can target Cortex-M chips and the ESP32 from the same async model you use on the server. The catch is toolchain discipline, peripheral singletons, and a healthy respect for stack sizes.

A few years ago, writing async Rust on a microcontroller meant gluing together half-finished crates and praying the linker script lined up. That era is over. With embassy 0.5, a real async executor, and HAL crates that target both cortex-m and riscv32 (ESP32), the same async fn you write for a Tokio service runs on a chip with 64KB of RAM.

I’ve shipped two small IoT boards in the last six months using this stack — a Cortex-M0+ sensor node and an ESP32-C3 gateway. Both run on a single firmware crate with #![no_std], and both share async drivers. This post is what I wish I’d had before the first attempt.

If you’re new to Rust on the server side, my Rust production stack notes cover the cargo and tracing patterns the embedded story borrows from.

Why Embassy Won

Embassy is an async runtime designed for no_std. It doesn’t allocate, it doesn’t depend on std::thread, and it ships with HAL crates for STM32, nRF, RP2040, and ESP32. The win is that you write your firmware as a set of async tasks, and the executor multiplexes them on a single core using interrupts.

The alternative is RTIC, which is excellent but uses a different mental model (priority-based preemption with shared resources). For most projects I touch, embassy’s cooperative async fits cleanly into the rest of the Rust ecosystem.

The other thing embassy gets right: select!, timers, and channels work the way they do in Tokio. If you’ve written async Rust before, your first embassy task feels boring in the best way.

# Cargo.toml — Cortex-M target, March 2024 versions
[package]
name = "firmware"
version = "0.1.0"
edition = "2021"

[dependencies]
embassy-executor    = { version = "0.5", features = ["arch-cortex-m", "executor-thread", "integrated-timers"] }
embassy-time        = { version = "0.3", features = ["tick-hz-32_768"] }
embassy-stm32       = { version = "0.1", features = ["stm32g071rb", "time-driver-any", "unstable-pac"] }
embassy-sync        = "0.5"
embassy-futures     = "0.1"
cortex-m-rt         = "0.7"
defmt               = "0.3"
defmt-rtt           = "0.4"
panic-probe         = { version = "0.3", features = ["print-defmt"] }

[profile.release]
codegen-units = 1
debug         = 2
lto           = "fat"
opt-level     = "s"

opt-level = "s" matters. On a flash-constrained chip the difference between opt-level = 3 and opt-level = "s" is often 8–12% of flash. I’ve never regretted favoring size on embedded targets.

A First Blinky That Isn’t Embarrassing

The hello-world for embedded is blinking an LED. Here’s an embassy version with two concurrent tasks — one blinks, one reads a button and adjusts the blink rate via a channel. This is the pattern you’ll use for real firmware.

#![no_std]
#![no_main]

use defmt_rtt as _;
use panic_probe as _;

use embassy_executor::Spawner;
use embassy_stm32::exti::ExtiInput;
use embassy_stm32::gpio::{Input, Level, Output, Pull, Speed};
use embassy_sync::blocking_mutex::raw::ThreadModeRawMutex;
use embassy_sync::channel::Channel;
use embassy_time::{Duration, Timer};

static RATE: Channel<ThreadModeRawMutex, Duration, 4> = Channel::new();

#[embassy_executor::task]
async fn blink(mut led: Output<'static>) {
    let mut period = Duration::from_millis(500);
    loop {
        // Non-blocking try_receive: pick up new rate if any
        if let Ok(new) = RATE.try_receive() {
            period = new;
        }
        led.toggle();
        Timer::after(period).await;
    }
}

#[embassy_executor::task]
async fn button(mut input: ExtiInput<'static>) {
    let mut fast = false;
    loop {
        input.wait_for_falling_edge().await;
        fast = !fast;
        let d = if fast { Duration::from_millis(100) } else { Duration::from_millis(500) };
        RATE.send(d).await;
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());
    let led = Output::new(p.PA5, Level::Low, Speed::Low);
    let btn = ExtiInput::new(Input::new(p.PC13, Pull::Up), p.EXTI13);

    spawner.spawn(blink(led)).unwrap();
    spawner.spawn(button(btn)).unwrap();
}

Two things worth calling out. First, peripherals (p.PA5, p.EXTI13) are singletons — you can only take each one once. The HAL gives them to you in init(), and the type system prevents two tasks from grabbing the same pin. Second, the channel is a static with capacity 4. There is no heap; everything is sized at compile time.

ESP32 Is the Same Story, Mostly

ESP32 support landed in embassy via esp-hal and esp-hal-embassy. The async model is identical; the differences are vendor flash tooling and the WiFi/Bluetooth radio drivers, which are still C blobs wrapped in unsafe Rust.

# Cargo.toml — ESP32-C3 target
[dependencies]
esp-hal           = { version = "0.16", features = ["esp32c3", "async"] }
esp-hal-embassy   = { version = "0.1", features = ["esp32c3", "time-timg0"] }
embassy-executor  = { version = "0.5", features = ["task-arena-size-20480"] }
embassy-time      = { version = "0.3", features = ["generic-queue-8"] }
embedded-hal-async = "1.0"
esp-backtrace     = { version = "0.11", features = ["esp32c3", "panic", "exception", "println"] }
esp-println       = { version = "0.9", features = ["esp32c3"] }

The task-arena-size-20480 feature is required because the C3 doesn’t have nested vector tables the way Cortex-M does — embassy allocates task storage from a fixed arena. Pick the size carefully; too small and spawn panics, too large and you’ve burned RAM.

WiFi on ESP32 still pulls in esp-wifi, which links against the proprietary IDF blobs. It works, but expect a 200KB+ flash hit and weird thread-mode constraints. For network-connected sensors I usually accept it; for battery-powered nodes I use LoRa or BLE peripheral mode in pure Rust.

Drivers, HALs, and the `embedded-hal` Contract

The thing that makes embedded Rust portable is embedded-hal and its async sibling embedded-hal-async. Driver crates target these traits, not specific chips. So an lis3dh accelerometer driver works on STM32, nRF, and ESP32 without modification.

use embedded_hal_async::i2c::I2c;
use lis3dh_async::{Lis3dh, SlaveAddr};

#[embassy_executor::task]
async fn accel<I: I2c + 'static>(i2c: I) {
    let mut sensor = Lis3dh::new_i2c(i2c, SlaveAddr::Default).await.unwrap();
    loop {
        let accel = sensor.accel_norm().await.unwrap();
        defmt::info!("x={=f32} y={=f32} z={=f32}", accel.x, accel.y, accel.z);
        Timer::after(Duration::from_millis(50)).await;
    }
}

The driver doesn’t know or care which chip it’s on. If you write your own driver, target embedded-hal-async traits and your code outlives your hardware choice. See the embedded-hal docs for the current trait surface.

Logging Without `println!`

You can’t println! on an MCU — there’s no stdout. The community converged on defmt, a logging framework that interns format strings on the host so the firmware only transmits binary tokens over RTT (real-time transfer, an ARM debug feature).

defmt::info!("connected, rssi={=i8} dBm, channel={=u8}", rssi, channel);

The host-side probe-rs tool decodes those tokens back into formatted strings. The result is logs that cost ~4 bytes per call instead of 40, and you can leave them enabled in release builds. For most projects, probe-rs run --chip STM32G071RBTx target/... is the entire developer loop. No openocd, no gdb invocation gymnastics.

Common Pitfalls

A few I’ve hit more than once.

Stack overflows are silent. A no_std chip has no MMU, so an overflow just corrupts RAM. Set MEMORY sizes in memory.x realistically and use cortex-m-rt’s _stack_start paint feature in debug builds to detect high-water marks. Async futures can be surprisingly large if you .await deep call chains.

embassy-time driver mismatch. Each HAL provides a time driver; you pick exactly one via Cargo features. If two crates pull in conflicting drivers, the link fails with cryptic symbol clashes. Use cargo tree -d to spot duplicates.

Float on M0. Cortex-M0/M0+ has no FPU. f32 works but is slow software emulation. If you’re doing DSP, pick a chip with an FPU (M4F, M33F) or stay in fixed-point with fixed.

static mut is a footgun. Reach for static-with-interior-mutability primitives from embassy-sync (Mutex, Signal, Channel) instead. The new &raw const syntax helps but doesn’t fix the soundness issue.

Power management is your job. Embassy puts the core to sleep via WFI when no task is ready, but peripherals don’t auto-disable. If your battery node draws 8mA at idle, that’s a USART you forgot to turn off.

Wrapping Up

Embedded Rust in 2024 is no longer a curiosity. Embassy gives you the async model from the server side, embedded-hal-async gives you portable drivers, and probe-rs plus defmt gives you a developer loop that doesn’t make you want to quit. The hard parts are now hardware-shaped, not Rust-shaped, which is the right place for them to be.

If you’re starting fresh, pick a chip with an existing embassy HAL, target embedded-hal-async for any driver you write, and budget time for the toolchain on day one. The rest follows.

Why Embassy Won

A First Blinky That Isn’t Embarrassing

ESP32 Is the Same Story, Mostly

Drivers, HALs, and the embedded-hal Contract

Logging Without println!

Common Pitfalls

Wrapping Up

Related posts

Lessons From a Year of Rust, Postgres, and AI Agents

Shipping Rust to Kubernetes, Smaller Images and Faster Cold Starts

Rust Service Observability in 2024, Metrics, Logs, and Traces That Help

Safe Shared State in Rust, Arc, Mutex, and the Channel You Should Pick

Rust to WebAssembly at the Edge, Wasmtime and WASI in 2024

Building an HTTP Service with Axum 0.7, From Zero to Tracing

Async Rust Without the Footguns, Tokio Patterns in 2024

Rust in Production, Where the 2024 Stack Has Matured

Let’s Start a Project

Drivers, HALs, and the `embedded-hal` Contract

Logging Without `println!`