background-shape
Lessons From a Year of Rust, Postgres, and AI Agents
December 20, 2024 · 9 min read · by Muhammad Amal programming

TL;DR — Rust earned its keep on agent infrastructure. Postgres absorbed most of the workloads people thought needed exotic databases. Agents required more boring engineering than the hype suggested. The three together are a surprisingly powerful stack.

Around this time last year, I sat down to plan my 2024 personal learning goals, and the three I picked were go deeper in Rust, get more disciplined about Postgres, and ship at least one serious agentic system into production. By December 2024, I have done all three, in some cases for the same projects, and the intersection turned out to be more interesting than any of the individual threads.

This is the lessons-learned post for the year. Not a tutorial. Not a benchmark. Just the patterns I keep returning to, the things I changed my mind about, and the assumptions I started the year with that turned out to be wrong. Some of this is opinionated. Some of it might age badly. I am writing it down anyway, because December is the month to consolidate the year’s learning before it gets diluted by the next thing.

If you want the bigger-picture review of where the agentic ecosystem ended up, see The 2024 Wrap Up, The Agentic Era for Backend Engineers. This post is narrower and more personal.

What Rust earned

I came into 2024 already convinced that Rust had a place in backend systems. By December, I am more convinced, but for different reasons than I expected. The thing Rust earned this year was not the obvious “we got rid of memory bugs” story. It was the much less glamorous “we built infrastructure that does not surprise us at 3am” story.

Specifically, the workloads where Rust paid off most clearly in 2024 were:

  • Agent tool servers. The thing that runs the actual tools the LLM calls. Tokio, axum, and sqlx made it boringly easy to build a tool server that does not get knocked over by a runaway agent loop. Tail latency was strictly better than the equivalent Python stack we replaced.
  • Embedding pipelines. Batch processing of documents into embeddings, with retries, rate-limiting, and backpressure. Rayon for the embarrassingly parallel parts, async tokio for the I/O. Memory usage was meaningfully lower than the Python pipeline at the same throughput.
  • Real-time event processing. The boring CDC and streaming work. Postgres logical replication into a Rust consumer feeding a vector index. Predictable, observable, and cheap.
// The shape that kept showing up
async fn run_tool(
    State(state): State<AppState>,
    Json(req): Json<ToolRequest>,
) -> Result<Json<ToolResponse>, ToolError> {
    let timer = state.metrics.timer("tool.duration", &[("tool", &req.name)]);
    let _guard = state.rate_limiter.acquire(&req.tenant).await?;
    let result = state.tool_registry
        .get(&req.name)
        .ok_or(ToolError::NotFound)?
        .execute(req.args, &state.ctx)
        .await?;
    timer.observe();
    Ok(Json(result))
}

That is not exotic. It is just the kind of code that, once written, you do not have to think about again. Which turned out to be exactly what an agentic system needs underneath it, because the agent layer is where all the uncertainty lives. You want the layer below to be aggressively boring.

The Rust ecosystem matured in ways that helped. sqlx 0.8, tokio 1.40-ish, axum 0.7, tracing as the default observability story. The thing I changed my mind about during the year is that the borrow checker is no longer the main onboarding cost for new Rust engineers. The async story is. Pin, Send, Sync, lifetimes inside futures, all of that is where new hires get stuck. Teams that planned for this and built up a small set of patterns scaled better than teams that expected the standard library to be self-explanatory.

What Postgres absorbed

I went into 2024 expecting we would need a dedicated vector database. By December, we are still running on Postgres with pgvector and HNSW indexes, and the few times we benchmarked alternatives, the result was always “Postgres is fine, the cost of operating a second system is not.” This is, I think, the single biggest practical lesson of my year.

Postgres 17 shipped in September. The improvements that mattered to me were:

  • Better B-tree scans for common index patterns. The OLTP workloads got faster without changing a thing.
  • Streaming logical replication, finally usable at the scale I cared about.
  • Continued maturation of incremental sort and parallel index builds.

pgvector 0.7 and then 0.8 closed the gap with dedicated vector databases for the workloads I had. Hybrid search with the new sparse-vector support, HNSW with iterative scan, decent recall at acceptable latency. The thing I learned the hard way is that the parameter tuning matters enormously, and the defaults are conservative. If you are running pgvector and have not tuned hnsw.ef_search, you are leaving recall on the table.

-- The settings that ended up in our cluster after a year of tuning
SET hnsw.ef_search = 100;  -- up from default 40
SET work_mem = '128MB';     -- for big sort operations
SET maintenance_work_mem = '2GB';  -- for index builds
SET random_page_cost = 1.1;  -- SSD-appropriate

-- The query shape that worked for hybrid search
WITH semantic AS (
  SELECT id, embedding <=> $1 AS sim
  FROM documents
  ORDER BY embedding <=> $1
  LIMIT 100
),
keyword AS (
  SELECT id, ts_rank(tsv, $2) AS rank
  FROM documents
  WHERE tsv @@ $2
  ORDER BY rank DESC
  LIMIT 100
)
SELECT d.*, (s.sim * 0.6) + (k.rank * 0.4) AS score
FROM documents d
JOIN semantic s USING (id)
JOIN keyword k USING (id)
ORDER BY score
LIMIT 20;

That is roughly the shape of the hybrid search query that ran most of our production retrieval by late 2024. Boring, fast enough, observable, joinable to the rest of the application data without having to ferry IDs across systems. The Crunchy Data team’s Postgres + AI playbook is a good external reference for this kind of pattern.

The thing I changed my mind about during the year is that Postgres is no longer just a transactional database with extensions bolted on. It is a credible backbone for AI workloads at the small-to-medium scale, and the small-to-medium scale covers more production systems than people admit. Reach for a dedicated vector DB when you have actually measured Postgres falling over, not before.

For more on how to make these architecture choices defensible to non-engineers, see Translating Business Impact into Architecture Decisions.

What agents demanded

The agents I shipped in 2024 demanded more boring engineering than the framework demos suggested. By December, my private architecture diagram for “production-ready agent system” has more boxes for state stores, retry queues, eval harnesses, and human-in-the-loop UIs than it does for the agent itself. The agent is the small box in the middle. Everything around it is the thing that makes the small box useful.

+--------------------------+
|    Request Ingestion     |
+------------+-------------+
             |
+------------v-------------+      +-----------------------+
|   Idempotency Store      |<---->|  Run Registry (PG)    |
+------------+-------------+      +-----------------------+
             |
+------------v-------------+      +-----------------------+
|     Agent Loop (LG)      |<---->|  Checkpoint Store     |
+------------+-------------+      +-----------------------+
             |
+------------v-------------+      +-----------------------+
|     Tool Servers (Rust)  |<---->|  Tool Audit Log       |
+------------+-------------+      +-----------------------+
             |
+------------v-------------+
|   Human Review UI / API  |
+--------------------------+

Each of those boxes is unglamorous. None of them are in the framework docs. All of them turned out to be load-bearing once we put real users in front of the system. The agent itself, once the surrounding boxes were right, was the simplest part of the architecture.

The other lesson, which I am still in the middle of learning, is that eval is harder than the production code. Building a representative eval set, keeping it fresh as the system evolves, gating deploys on regression, scoring across multiple dimensions (correctness, cost, latency, safety) — all of this is its own discipline. The teams that built this muscle in 2024 are entering 2025 with a real advantage. We are halfway there. Q1 is the catch-up.

The intersection

The three together, Rust + Postgres + agents, turned out to be a coherent stack in ways I did not expect at the start of the year. The shape that emerged:

  • Rust as the system layer (tool servers, event processors, glue).
  • Postgres as the durable layer (state, embeddings, audit logs, business data).
  • Agent framework (LangGraph in our case, but the choice mattered less than the architecture) as the orchestration layer.

What ties them together is that all three reward boring engineering. Rust punishes cleverness with compile errors. Postgres punishes cleverness with surprising query plans. Agents punish cleverness with hallucinations and runaway costs. The thing that keeps the system shippable is the same in all three layers, which is the discipline of small, observable, recoverable units of work, composed cleanly, with state in places you can inspect.

It is not the only stack that works. Python + Postgres + LangGraph is more common and ships just as well for most workloads. Go + Postgres + custom agent code is what a few smart teams I respect chose. The point is not that this specific combination is best. The point is that the property of “boring at the layer below the uncertainty” is what makes the layer above tolerable.

Common Pitfalls

Reaching for exotic infrastructure too early. Vector databases, graph databases, dedicated state stores. The Postgres-first answer is right more often than people admit. Run the benchmark before you adopt the new system.

Writing Rust for the wrong workloads. Rust is great for the system layer. It is a mediocre choice for fast iteration on business logic. Mixing both in the same service slows the team down. Split the surfaces.

Treating the agent framework as the whole solution. It is the orchestration layer. The hard work is everywhere else. Plan for the boxes that surround the agent, not just the agent itself.

Underinvesting in eval. I did this in Q2 and paid for it in Q3. Build eval as the first thing, not the last. The cost is real. The cost of not having it is much larger.

Ignoring observability across layers. Tracing has to cross Rust, Python, the agent loop, and Postgres. The teams that aligned on OpenTelemetry early had a much easier time. The teams that did not are still picking up the pieces.

Optimizing what you can measure, not what matters. Easy to measure: latency, token count, error rate. Harder to measure: actual user outcomes, hallucination rate, cost-adjusted value. The harder ones are the ones that pay.

Wrapping Up

The thing about end-of-year retrospectives is that the lessons are always less interesting than they sound when you condense them into a sentence. The honest summary of my 2024 is that I got better at the boring stuff and stopped chasing the exciting stuff, and the result was that I shipped more useful systems than I did in any of the previous three years.

Rust earned its place. Postgres absorbed more than I expected. Agents demanded more rigor than the demos suggested. The three together became a stack I trust. Some of that will hold up in 2025. Some of it will not. The discipline of writing it down is what makes the difference between learning the lessons and just rotating through the same surprises year after year.

If you take one thing into 2025 from this post, take this. Spend a quiet hour between now and January 1st writing your own version of this list. The three things you got better at. The three assumptions you changed your mind about. The three habits you want to carry forward. The act of writing it down compounds. The not-writing-it-down does not. See you in the new year.