background-shape
Embedding wasmtime in a Rust Host, A Hands On Guide
February 10, 2025 · 10 min read · by Muhammad Amal programming

TL;DR — Embed wasmtime 28 in Rust with Engine, Linker, Store. Set StoreLimits for memory caps, add fuel or epoch interruption for runaway code, prefer async_support(true) for cooperative multitasking. Production hosts need all four.

The wasmtime crate makes the easy thing easy: load a .wasm file, call a function, done. The hard part is everything you do once that wasm function is allowed to do interesting things. Allocate memory, run forever, block on I/O, talk to the host. That’s where you spend the engineering time, and that’s what this article is about.

I’m going to assume you’ve worked through the cargo-component tutorial or otherwise have a .wasm component on disk. We’ll embed wasmtime 28 properly, with the resource limits and safety machinery you actually need in production. By the end you’ll have a host that you’d be comfortable handing untrusted code.

Let’s set the scope. We’re not going to cover JIT vs AOT compilation modes deeply (use AOT in production via Engine::precompile_component). We’re not going to cover the Python or C APIs. Just the Rust embedding, end to end.

Project Setup

Start with a fresh binary crate and pin the deps. wasmtime moves fast and minor versions occasionally rename things.

cargo new --bin host-runner
cd host-runner

Cargo.toml:

[package]
name = "host-runner"
version = "0.1.0"
edition = "2021"

[dependencies]
wasmtime = { version = "28.0", features = ["component-model", "async", "pooling-allocator"] }
wasmtime-wasi = "28.0"
anyhow = "1"
tokio = { version = "1.42", features = ["full"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

The relevant feature flags are component-model (for WASI 0.2 components), async (for cooperative scheduling), and pooling-allocator (for multi-tenant cold start).

A Production-Shaped Host

The host architecture I default to looks like this:

+-----------------------------------------------------------+
|                       Tokio Runtime                       |
|                                                           |
|   +----------------+   +-------------------------------+  |
|   | Engine         |   |  per-request task             |  |
|   | (one global)   |   |  +-------------------------+  |  |
|   | + Config       |   |  | Store<HostState>        |  |  |
|   | + epoch tick   |   |  |  - limits: 64 MB        |  |  |
|   +----------------+   |  |  - fuel: 1e9 units      |  |  |
|                        |  |  - epoch deadline: 5s   |  |  |
|   +----------------+   |  +-----------+-------------+  |  |
|   | Linker         +-->| instantiate  |               |  |
|   | (one global)   |   |              v               |  |
|   +----------------+   |  +-------------------------+  |  |
|                        |  |  Component instance      |  |  |
|                        |  +-------------------------+  |  |
|                        +-------------------------------+  |
+-----------------------------------------------------------+

One Engine, one Linker, many short-lived Stores, each with its own limits. That’s the whole pattern.

Step 1, Configure the Engine

use std::time::Duration;
use wasmtime::{Config, Engine, InstanceAllocationStrategy, PoolingAllocationConfig};

pub fn build_engine() -> anyhow::Result<Engine> {
    let mut config = Config::new();
    config
        .wasm_component_model(true)
        .async_support(true)
        .consume_fuel(true)
        .epoch_interruption(true);

    let mut pool = PoolingAllocationConfig::default();
    pool.total_memories(64);
    pool.total_tables(64);
    pool.total_core_instances(64);
    pool.max_memory_size(64 * 1024 * 1024); // 64 MB hard cap
    config.allocation_strategy(InstanceAllocationStrategy::Pooling(pool));

    let engine = Engine::new(&config)?;
    Ok(engine)
}

A few decisions encoded here. Component model on. Async on, because we want cooperative scheduling. Fuel on, so we can meter execution. Epoch interruption on, so we can preempt by wall-clock deadline. Pooling allocator on, with hard caps. Every line is something you should be able to justify.

Step 2, Epoch Ticker

Epoch interruption is wasmtime’s mechanism for “stop the guest at the next safe point.” You arm it by calling Engine::increment_epoch periodically from the host. A tokio task does this fine.

use tokio::time::interval;
use std::sync::Arc;

pub fn spawn_epoch_ticker(engine: Arc<Engine>, tick: Duration) {
    tokio::spawn(async move {
        let mut t = interval(tick);
        loop {
            t.tick().await;
            engine.increment_epoch();
        }
    });
}

Tick every 100 ms in production. Per-store, you call Store::set_epoch_deadline(N) to say “interrupt me N ticks from now.” With a 100 ms tick and a deadline of 50, the guest gets at most 5 seconds of wall-clock.

Step 3, Per-Store Resource Limits

use wasmtime::{Store, StoreLimits, StoreLimitsBuilder};
use wasmtime::component::ResourceTable;
use wasmtime_wasi::{WasiCtx, WasiCtxBuilder, WasiView};

pub struct HostState {
    pub table: ResourceTable,
    pub wasi: WasiCtx,
    pub limits: StoreLimits,
}

impl WasiView for HostState {
    fn table(&mut self) -> &mut ResourceTable { &mut self.table }
    fn ctx(&mut self) -> &mut WasiCtx { &mut self.wasi }
}

pub fn build_store(engine: &Engine) -> Store<HostState> {
    let limits = StoreLimitsBuilder::new()
        .memory_size(32 * 1024 * 1024) // 32 MB per instance
        .table_elements(10_000)
        .instances(16)
        .build();

    let state = HostState {
        table: ResourceTable::new(),
        wasi: WasiCtxBuilder::new().inherit_stdio().build(),
        limits,
    };

    let mut store = Store::new(engine, state);
    store.limiter(|s| &mut s.limits);
    store.set_fuel(1_000_000_000).unwrap(); // ~1 billion ops
    store.set_epoch_deadline(50); // 50 ticks
    store
}

memory_size is the per-instance memory cap. The pooling allocator’s max_memory_size is the floor for the pool; per-store memory_size is the per-instance ceiling. The smaller one wins.

set_fuel is the execution metering knob. wasmtime decrements fuel per opcode group. When it hits zero, the guest traps. 1 billion units is roughly a second of compute on modern hardware, but profile your own workload.

set_epoch_deadline is the wall-clock backstop, in case the guest hits a tight loop that doesn’t consume fuel fast enough.

Step 4, Linker and Instantiation

use wasmtime::component::{Component, Linker};

pub async fn run_component(
    engine: &Engine,
    component_path: &str,
) -> anyhow::Result<()> {
    let component = Component::from_file(engine, component_path)?;

    let mut linker = Linker::<HostState>::new(engine);
    wasmtime_wasi::add_to_linker_async(&mut linker)?;

    let mut store = build_store(engine);
    let instance = linker.instantiate_async(&mut store, &component).await?;

    let func = instance
        .get_typed_func::<(String,), (String,)>(&mut store, "handle")?;
    let (result,) = func.call_async(&mut store, ("ping".to_string(),)).await?;
    println!("guest returned: {}", result);

    Ok(())
}

add_to_linker_async registers all WASI 0.2 host implementations against the linker. instantiate_async runs the guest’s _start (or equivalent) cooperatively. call_async runs the function, yielding when fuel exhausts or epoch fires.

Step 5, Wire Up main

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    tracing_subscriber::fmt()
        .with_env_filter("info,wasmtime=warn")
        .init();

    let engine = Arc::new(build_engine()?);
    spawn_epoch_ticker(Arc::clone(&engine), Duration::from_millis(100));

    let path = std::env::args().nth(1).expect("path to .wasm required");
    run_component(&engine, &path).await?;
    Ok(())
}
cargo run --release -- ../text-tools/target/wasm32-wasip2/release/text_tools.wasm

Handling Untrusted Guests

Production embedding is about what you do when the guest misbehaves. Here are the four scenarios you have to handle.

Out of Memory

StoreLimits causes a trap when the guest tries to grow memory past the cap. The trap surfaces as a wasmtime::Error with Trap::OutOfMemory. Pattern-match and return a typed error.

use wasmtime::Trap;

match func.call_async(&mut store, args).await {
    Ok(v) => Ok(v),
    Err(e) => {
        if let Some(Trap::OutOfMemory) = e.downcast_ref::<Trap>() {
            anyhow::bail!("guest exceeded memory budget");
        }
        Err(e)
    }
}

Runaway Compute

Fuel exhaustion traps as Trap::OutOfFuel. Epoch interruption traps as Trap::Interrupt. Both are recoverable from the host’s perspective. The store is poisoned, but the engine is fine.

Misbehaving I/O

WASI 0.2 I/O is capability-based. A guest can only touch the files, sockets, and clocks you preopened. To grant a single read-only directory:

use wasmtime_wasi::DirPerms;
use wasmtime_wasi::FilePerms;

let wasi = WasiCtxBuilder::new()
    .inherit_stdio()
    .preopened_dir("/srv/data", "/data", DirPerms::READ, FilePerms::READ)?
    .build();

Now the guest sees /data as read-only. Anything else is PermissionDenied.

Long-Running Hosts

The pooling allocator pre-allocates slots and reuses them. After many instantiations you should see flat memory, not growth. If you see growth, you’re likely holding Stores longer than the request lifecycle. Drop them.

Host-Provided Capabilities

WASI 0.2 is a baseline; real production hosts almost always add their own custom host functions to expose application-specific capabilities. The pattern is: define a WIT world that imports your custom interface, implement it on the host, register it on the linker, build guests against that world.

A toy example. Suppose you want guests to be able to emit structured metrics. Define an interface:

package amal:obs@0.1.0;

interface metrics {
    counter: func(name: string, value: u64);
    gauge: func(name: string, value: f64);
}

world hosted {
    import metrics;
    export run: func();
}

On the host, implement the interface by writing the trait impls that bindgen generates:

impl amal::obs::metrics::Host for HostState {
    fn counter(&mut self, name: String, value: u64) -> wasmtime::Result<()> {
        tracing::info!(counter = %name, value, "guest counter");
        Ok(())
    }
    fn gauge(&mut self, name: String, value: f64) -> wasmtime::Result<()> {
        tracing::info!(gauge = %name, value, "guest gauge");
        Ok(())
    }
}

Register on the linker:

amal::obs::metrics::add_to_linker(&mut linker, |s: &mut HostState| s)?;

Now guests can call amal:obs/metrics.counter("requests", 1) and it lands in the host’s tracing pipeline. The capability is granted because the host explicitly added it; if you don’t add it, guests get an instantiation error. This is the pattern for everything: rate limiting, database access, custom logging, encrypted secret retrieval. Define a WIT interface, implement on the host, register on the linker, audit at the WIT layer.

AOT Compilation for Cold Start

JIT compilation is fine for development. Production wants AOT.

let bytes = std::fs::read("text_tools.wasm")?;
let precompiled = engine.precompile_component(&bytes)?;
std::fs::write("text_tools.cwasm", &precompiled)?;

Load the .cwasm artifact directly:

let component = unsafe {
    Component::deserialize_file(&engine, "text_tools.cwasm")?
};

The unsafe is real. deserialize_file trusts the bytes were produced by a matching engine version. If you load a .cwasm from a different wasmtime version, behavior is undefined. Always re-precompile when you upgrade.

Cold start on a precompiled component is typically under a millisecond. JIT cold start on a 200 KB component is 20-50 ms. The difference matters for serverless workloads.

Common Pitfalls

Mixing sync and async linkers. add_to_linker_sync and add_to_linker_async populate different host implementations. Pick one per linker. Mixing them produces “host function returned future when sync expected” errors at runtime.

Forgetting store.limiter(...). StoreLimitsBuilder::build() returns a value, but limits only apply if you also call store.limiter(|s| &mut s.limits). Easy to miss, results in caps being silently ignored.

Sharing Store across tasks. A Store is !Sync. Each request needs its own store. Don’t wrap one in a Mutex and call from multiple tasks. Just don’t.

Holding Engine per-request. Engine is expensive to build (compiles host-side state, loads codegen backends). Build once at startup, share through Arc<Engine>. Stores are cheap. Engines are not.

Troubleshooting

fuel consumption is not enabled. You called set_fuel on a store from an engine whose Config::consume_fuel(true) wasn’t set. Fix on the engine config.

epoch deadline reached immediately on call. You set the deadline before starting the ticker, and the engine’s epoch is already past your deadline. Either start the ticker first or use store.epoch_deadline_async_yield_and_update(N) for relative semantics.

pooling allocator: instance limit exceeded at scale. The pool is sized for total_core_instances. Either raise the limit, or shorten your store lifetimes so slots get returned faster. wasmtime’s pooling docs at wasmtime.dev explain the tradeoff.

Observability and Tracing

A production host needs to know what its guests are doing. Wasmtime emits its own tracing events through the tracing crate (set WASMTIME_LOG=wasmtime=info to see them). For per-guest correlation, the pattern I use is to inject a request ID into the host state, log it on every host function entry, and propagate it into any spans you create.

For metrics, the pattern is straightforward: expose a Prometheus endpoint on the host that reports per-store metrics (instances created, fuel consumed, traps, mean execution time). The wasmtime Engine exposes counters via Engine::config() introspection in newer versions, but in 28.x you still mostly count yourself. Wrap your instantiate_async and call_async calls with timing and counter increments. Five lines per call site, lasts forever.

For tracing, propagating a parent span across the host boundary is unsolved in the standard. The workaround is to pass a trace context (W3C traceparent) as a string argument to your top-level guest function. The guest passes it back to any host functions it calls. Not elegant, but it gives you end-to-end traces.

Sample the right things. Per-call timing of every guest invocation is too much data. Sample at 1% in production for non-error traces, 100% for error traces, plus a heartbeat per-host that reports aggregate counters every 10 seconds. That cadence is enough to spot anomalies without flooding your backend.

Wrapping Up

You now have a host that loads components, runs them under hard memory caps, meters their compute, preempts them on wall-clock deadlines, and uses the pooling allocator for cold-start density. This is the embedding shape I run in production. The next article tackles Spin, which gives you most of this scaffolding out of the box if you want HTTP services without writing the host yourself.