The 2025 Technical Retrospective, Agents, Wasm, Edge AI, and MCP
TL;DR — 2025 was the year agents stopped being demos and started being infrastructure, WebAssembly outside the browser finally got serious, edge AI moved from research papers to product features, and the Model Context Protocol became the connective tissue. Plus a few predictions I got embarrassingly wrong.
I write a year-end retrospective every December. It’s selfish: forcing myself to look back at what I said in January and grade my own predictions is the single most effective way I’ve found to calibrate my technical judgment for the next year. It’s also the post that takes me longest to write, because honesty about being wrong is harder than writing about being right.
This year had more surprises than most. Twelve months ago I would have told you 2025 was the year agents would still mostly be impressive demos. I was wrong. I would have told you Wasm outside the browser would be a niche curiosity for another year. Also wrong. I would have told you the next big AI protocol wouldn’t ship until 2026. Wrong three times in a row.
This post is my honest accounting. What I got right, what I got wrong, what I learned, and what I think actually mattered. I’ll cover four major threads: agents going production, server-side Wasm, edge AI, and the Model Context Protocol. Then I’ll grade my own predictions from a year ago.
Thread 1, agents stopped being demos
In January 2025, the production agent landscape was thin. A few companies had ReAct-style agents in narrow workflows. Most “AI agents” were single-turn LLM calls with a tool definition stapled on. The promise of multi-step, multi-tool, durable agent execution was real but unproven at scale.
By December 2025, the picture is different. Three things happened.
First, the orchestration layer matured. LangGraph hit 0.4 and shipped real durability primitives. CrewAI added native state management. Anthropic’s Claude SDK shipped agent harnesses with built-in checkpointing. Microsoft’s AutoGen 0.6 had production-grade conversation management. The orchestration layer became unsexy plumbing, which is exactly what you want.
Second, the underlying models got better at multi-step reasoning. Claude Opus 4 and 4.5 in particular showed dramatic improvements in long-horizon task completion. GPT-5 (released in mid-2025) and Gemini 2.5 had similar jumps. The “can the model recover from its own mistakes” question went from “rarely” to “often.”
Third, and most importantly, observability for agent workflows got serious. LangSmith, Arize, Weights & Biases, and several smaller players shipped tools that let you actually debug a thirty-step agent run. Before this, the failure mode was “it broke somewhere, good luck.” Now you can replay, inspect, and patch. Production agents became debuggable.
The result: by Q4 2025, I know of dozens of companies running agent workflows in production for real money. Not toy demos. Order fulfillment, customer support triage, code review automation, document processing pipelines. The patterns I wrote about in Multi Agent Systems in 2025, Architecture Patterns That Work have been heavily exercised this year. The supervisor pattern in particular has become the default for most teams.
What I got wrong: I thought reliability would gate adoption longer than it did. The answer was that teams discovered they could accept a much lower per-step reliability than I expected, as long as the workflow had explicit checkpointing and humans-in-the-loop at the right places. The architecture compensated for model fragility better than I predicted.
Thread 2, server-side Wasm grew up
WebAssembly outside the browser has been “the next big thing” for about five years. In 2025, it finally started shipping in places that mattered.
The big technical milestone was the Component Model. Components reached stability in major runtimes (Wasmtime, WasmEdge) in mid-2025. This was the unlock. Components gave Wasm modules a proper interface type system, dynamic linking, and a way to compose modules across languages. Without components, Wasm was a faster sandbox. With components, it’s something closer to a portable distribution format.
What this enabled in practice:
- Fastly and Cloudflare ramped up component-based edge compute, letting customers run Wasm modules with proper dependency graphs rather than monolithic builds.
- Several database vendors (notably SingleStore and a few others) shipped Wasm-based user-defined functions, which is a much safer execution model than the LD_PRELOAD nightmares we lived with before.
- Spin and Wasmtime-based serverless platforms hit a usability level where I’d recommend them for new greenfield work in 2026, not just experiments.
The pattern I see clearly now: Wasm isn’t replacing containers. It’s filling a different niche. Containers are for full applications. Wasm components are for plugins, UDFs, edge handlers, and embedded workloads inside larger systems. The two coexist, and the line between them is becoming clearer rather than blurrier.
I covered some of this in Server Side WebAssembly February 2025, Practical Survey. What I underestimated in that post was the component model timeline. I thought stable components were a 2026 thing. They shipped in summer 2025.
What I got wrong: I thought the WASI threads proposal would be the bottleneck for serious server-side adoption. It wasn’t. The teams shipping Wasm in production mostly didn’t need shared-memory threading. They needed good IPC, which components delivered.
Thread 3, edge AI got real
The story I would have predicted in January 2025: edge AI continues to be promising but constrained, with most “edge” deployments being just smaller cloud GPUs.
The story that actually unfolded: small language models (under 4B parameters) got dramatically more capable, and the inference runtime ecosystem (llama.cpp, MLX, ONNX Runtime, vLLM-edge) caught up to the hardware. By Q3 2025, you could run genuinely useful models on a phone, a laptop, or a Raspberry Pi 5.
The specific milestones that mattered:
- Phi-4 and Llama 3.3 sub-4B variants hit quality levels that would have required 70B models in 2023.
- Apple’s MLX framework matured to the point where on-device inference on Apple Silicon is production-viable for many use cases.
- Quantization techniques (especially 4-bit and 2-bit weight quantization with FP16 activations) became reliable enough to ship with confidence.
- Mobile inference on iPhones and high-end Android devices crossed the threshold where real-time conversational AI is feasible without network round trips.
The implication for backend engineering: a lot of workloads that I assumed would stay in the cloud are migrating to the edge. Document classification, conversation summarization, image labeling, voice transcription. Not all of them, but enough that the architectural question “where does this inference happen” became a real design choice rather than a foregone conclusion.
I wrote about the early signals of this in SLM Landscape January 2025, Practical Survey. I want to flag that I was too conservative there. I said edge AI was a 2026 story for most production use cases. It was a 2025 story for many of them.
What I got wrong: I underestimated how quickly the privacy and cost arguments would push teams toward edge inference. Once a CFO realized the per-request cost of running summarization on-device vs. via API, the conversation shifted faster than I expected. The cost delta is real and it compounds.
Thread 4, MCP became the wire format
The Model Context Protocol started as an Anthropic project in late 2024. By December 2025, it’s become the closest thing the AI tooling ecosystem has to a standard for tool exposure.
The growth in 2025 was striking. In January, MCP had maybe a dozen public servers and a few hundred users. By June, the ecosystem had several hundred public servers and meaningful adoption inside OpenAI, Google, and Microsoft tooling. By November, MCP became the default integration layer in most major IDEs, including JetBrains, VS Code, and Zed. The protocol itself remained stable, which was the most important thing.
Why it worked when other standards didn’t:
- It’s small. The spec is readable in an hour. You can implement a server in a day.
- It maps cleanly to how LLMs already think about tools. The transport doesn’t impose a new mental model.
- It was adopted by the model vendors themselves, not just tooling startups. That gave it gravitational pull.
- It deliberately punted on hard problems (auth, discovery, long-running tasks) in v1 and added them incrementally. This let it ship.
I see MCP becoming infrastructure the same way HTTP became infrastructure. Most engineers won’t think about it directly. They’ll use libraries that abstract it. But the fact that the protocol exists is what makes the libraries possible.
The most interesting development late in the year was the convergence of MCP with the broader agent runtime story. Several frameworks now treat MCP as the default tool transport, which means agent workflows are naturally portable across runtimes. This is the kind of standardization that compounds.
What I got wrong: I was skeptical that MCP would get traction outside the Anthropic ecosystem. I thought competing protocols would fragment the space. They mostly didn’t. The vendor adoption surprised me.
Things that didn’t work out
The retrospective has to include the misses. Three things I expected to be big in 2025 that weren’t.
Decentralized AI training
The narrative around federated and decentralized training was loud in late 2024. The actual progress in 2025 was thin. The economics don’t favor it for large models, and the latency hit for distributed gradient sync at meaningful scale remained brutal. Some interesting research, no real shift in how models actually get trained.
Rust eating Python in ML
I half-expected 2025 to be the year Rust started displacing Python in serious ML tooling. It mostly didn’t. Rust grew in adjacent infrastructure (inference servers, tokenizers, vector databases) but the data scientist sitting in a notebook still writes Python and probably will for years. The migration is real but slower than I thought.
Model serving consolidation
I expected the model serving space (vLLM, TGI, MLC, Triton, etc.) to consolidate down to two or three winners. Instead it remained fragmented, with each framework finding a niche. The consolidation may happen in 2026, but it didn’t happen this year.
Scoring my January 2025 predictions
Honesty audit. Five predictions I made publicly or privately at the start of the year:
- “Agents will mostly remain demos in 2025.” Wrong. Production agents shipped in volume.
- “WASI components are a 2026 milestone.” Wrong. Shipped mid-2025.
- “The next big AI protocol won’t emerge until 2026.” Wrong. MCP emerged in 2024 and matured in 2025.
- “Edge AI stays niche through 2025.” Wrong. Edge inference shipped in mainstream products.
- “Multi-modal models will be the headline.” Mostly right, but they were not as transformative for backend engineers as I expected.
That’s four out of five wrong. The lesson is one I learn every year and somehow don’t internalize: the timelines for technology adoption in compute infrastructure tend to be faster than I predict when the underlying economics flip favorably, and slower when they don’t. I underweight economic forces and overweight technical readiness.
The book that’s helped me most with calibrating these predictions is Superforecasting by Tetlock. The discipline of writing predictions down, scoring them, and adjusting is uncomfortable but valuable.
Common Pitfalls in technical retrospectives
- Grading on a curve. “Mostly right” is doing a lot of work in most retrospectives. Be specific about what you predicted and what actually happened.
- Selection bias on the misses. You’ll remember the predictions you got right and forget the ones you got wrong. Keep a written record at the start of the year.
- Confusing trends with adoption. A trend can be real and adoption can still be slow. Distinguishing between the two is most of the calibration work.
- Over-weighting late-year news. What happened in November will feel more important than what happened in March. Look back at your January-March posts and ask which still feel important.
When This Goes Wrong
Your retrospective just validates last year’s narrative. Diagnosis: you’re not looking hard enough at what you got wrong. Fix: ask three colleagues what they think you got wrong this year. Their answers are probably more honest than yours.
You can’t tell which trends mattered. Diagnosis: too much input, too little reflection. Fix: pick three threads, write deeply about each, accept that you’re leaving things out.
Your predictions for next year feel like restating this year. Diagnosis: anchoring on the present. Fix: ask “what would have to be true for next year to look completely different from this year?” Answer that, then write predictions.
Wrapping Up
2025 was a faster year than I expected. The trends I underestimated (agents, Wasm components, edge AI, MCP) all share a common shape: the underlying economics or technical readiness flipped favorably, and adoption compressed into a few quarters rather than playing out over years.
What I want to take into 2026 is a calibration adjustment. When the economics flip, move faster on adoption. When the protocol or runtime stabilizes, start building rather than waiting for the third version. When the small models close the gap with the large ones, start moving inference to where the data lives rather than the reverse.
My predictions for 2026 will land in a separate post. The short version: less hype around AGI, more hype around AI-native operating systems, and a continued slow grind of agents becoming boring infrastructure. We’ll see how well that ages a year from now.