Production Agents with LangGraph, State Machines Over Chains

Production Agents with LangGraph, State Machines Over Chains

May 8, 2024 · 6 min read · by Muhammad Amal programming

TL;DR — Chains hide state. State machines force you to declare it. For agents running longer than 30 seconds, declared state is what makes debugging tractable.

I built my first production agent on a LangChain AgentExecutor in late 2023. It worked beautifully in the notebook, fine in staging, and started leaking memory and looping silently in production within a week. The problem wasn’t the LLM. The problem was that I had no way to ask “what state was this agent in at minute 4 of the run?” because the state was scattered across closures, callbacks, and a queue of intermediate steps I couldn’t easily serialize.

LangGraph fixes this by forcing you to be explicit. You define a TypedDict for state, you write nodes that mutate it, and you draw edges between nodes. That’s the whole model. It maps cleanly onto how I actually think about agents, which is as finite state machines with one weird state called “ask the LLM what to do next.”

This post is the build I’d ship today for a moderately complex agent. Tool calling, retries, a human approval gate, and durable checkpoints so a crash doesn’t lose work. We’ll start with the state, then the nodes, then the edges, then the production concerns.

Define the state first

Everything else falls out of getting this right. The state is what survives across nodes, what gets checkpointed, and what you’ll stare at in the debugger.

from typing import TypedDict, Annotated, Literal
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    iteration: int
    pending_tool_call: dict | None
    awaiting_human: bool
    last_error: str | None
    budget_tokens: int

A few choices worth flagging. messages uses the add_messages reducer so node returns append rather than overwrite. iteration is your loop guard. pending_tool_call lets you pause before executing a tool. awaiting_human is the flag a human-in-the-loop edge reads. last_error is the field you’ll thank yourself for adding the first time a tool throws.

If you find yourself stuffing dicts of dicts into state, stop. Promote the inner fields. Flat state is debuggable state.

Nodes are pure functions over state

Each node takes the full state and returns a partial state. The graph merges it for you. I prefer keeping nodes small enough that you can read one without scrolling.

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def search_inventory(sku: str) -> str:
    """Look up stock level for a SKU."""
    return f"SKU {sku}: 42 units"

@tool
def refund_order(order_id: str, amount_cents: int) -> str:
    """Refund an order. Requires human approval."""
    return f"Refunded order {order_id} for {amount_cents} cents"

tools = [search_inventory, refund_order]
HIGH_RISK = {"refund_order"}

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0).bind_tools(tools)

def agent_node(state: AgentState) -> dict:
    response = llm.invoke(state["messages"])
    tool_calls = getattr(response, "tool_calls", None) or []
    pending = tool_calls[0] if tool_calls and tool_calls[0]["name"] in HIGH_RISK else None
    return {
        "messages": [response],
        "iteration": state["iteration"] + 1,
        "pending_tool_call": pending,
        "awaiting_human": bool(pending),
    }

def tool_node(state: AgentState) -> dict:
    from langchain_core.messages import ToolMessage
    last = state["messages"][-1]
    outputs = []
    for call in last.tool_calls:
        tool_fn = {t.name: t for t in tools}[call["name"]]
        try:
            result = tool_fn.invoke(call["args"])
            outputs.append(ToolMessage(content=str(result), tool_call_id=call["id"]))
        except Exception as exc:
            outputs.append(ToolMessage(content=f"ERROR: {exc}", tool_call_id=call["id"]))
            return {"messages": outputs, "last_error": str(exc)}
    return {"messages": outputs, "last_error": None}

Notice tool_node returns errors to the model as ToolMessages rather than raising. The LLM is often the best retry mechanism you have, but only if it sees the failure. Hiding errors from the model is the single most common bug I see in agent code.

Edges declare control flow

This is where state machines beat chains. The transitions are visible.

from langgraph.graph import StateGraph, END

def route_after_agent(state: AgentState) -> Literal["tools", "human", "__end__"]:
    if state["iteration"] >= 10:
        return END
    if state["awaiting_human"]:
        return "human"
    last = state["messages"][-1]
    if getattr(last, "tool_calls", None):
        return "tools"
    return END

def human_gate(state: AgentState) -> dict:
    return {"awaiting_human": False, "pending_tool_call": None}

builder = StateGraph(AgentState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.add_node("human", human_gate)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", route_after_agent)
builder.add_edge("tools", "agent")
builder.add_edge("human", "tools")

The conditional edge function is where production logic lives. Iteration cap before anything else. Human approval gate before high-risk tools. Tool dispatch as the default for tool calls. Termination on final answer. You can read this code and know what the agent will do; you can’t say that about most chain-based code.

Checkpoints make crashes survivable

This is the feature that turned LangGraph from “nice abstraction” to “production default” for me. Add a checkpointer and the graph persists state after every node. You can resume by replaying from the last checkpoint with the same thread ID.

from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string("agent_state.db")
graph = builder.compile(checkpointer=memory, interrupt_before=["human"])

config = {"configurable": {"thread_id": "ticket-2241"}}
state = graph.invoke(
    {"messages": [("user", "Refund order 9981 for $42.")],
     "iteration": 0, "pending_tool_call": None,
     "awaiting_human": False, "last_error": None, "budget_tokens": 50000},
    config=config,
)

The interrupt_before=["human"] argument is the production pattern. The graph runs until it hits the human node, then pauses. Your API returns a “pending approval” response. When a human clicks approve, you resume with the same thread_id and the graph picks up exactly where it stopped.

For Postgres-backed deployments, the langgraph-checkpoint-postgres package gives you the same thing with multi-process safety. Use it from day one if you’re behind any kind of worker pool.

Streaming and observability

Once your graph is running, you’ll want to watch it. LangGraph exposes a .stream() method that yields state diffs per node, which is gold for both UI streaming and logging.

for chunk in graph.stream(input_state, config=config, stream_mode="updates"):
    for node, update in chunk.items():
        print(f"[{node}] {list(update.keys())}")

In production I wire this into structured logs with the thread ID, node name, and any state keys that changed. Combined with the checkpointer, that gives you the audit trail you need when a customer asks “why did the agent do that?”

For a comparison with the chain-based approach I left behind, see /blog/agentic-ai-landscape-may-2024/.

Common Pitfalls

Things I’ve broken so you don’t have to.

Iteration caps as the only stop condition. Add a token budget too. A model can do a lot of damage in 10 iterations if each one calls a 4k-token tool.
Mutating state in place. Always return a new dict from nodes. LangGraph relies on returns to apply reducers correctly.
Forgetting the reducer for messages. Without Annotated[list, add_messages], every return overwrites and you lose history.
Treating interrupt_before as the only safety mechanism. Validate tool arguments in the tool itself too. Defense in depth.
Storing secrets in state. State gets checkpointed to disk. Keep API keys and PII out of it or encrypt the checkpointer.
Using SQLite checkpointer with multiple workers. SQLite is fine for a single process. Postgres for anything else.

Wrapping Up

The shift from chains to state machines is the most important architectural change in agent code over the last year. Chains optimize for a beautiful happy path. State machines optimize for the moment something breaks and you have to figure out where. Production is the second one most of the time.

LangGraph isn’t the only way to build state-machine agents. You could write the same thing in pure Python with a dict and a while loop, and for simple flows that’s fine. What LangGraph buys you is the checkpointer, the streaming primitives, and the visual graph for documenting what you built. Those are the parts I no longer want to write by hand.

Next on my list, and probably the next post, is making the tools the agent calls actually robust. Schemas, validation, error messages the model can act on. That’s where most agents fail in practice, and where most posts wave their hands.

Define the state first

Nodes are pure functions over state

Edges declare control flow

Checkpoints make crashes survivable

Streaming and observability

Common Pitfalls

Wrapping Up

Related posts

The Agentic AI Landscape in May 2024, LangGraph, AutoGen, CrewAI

Evaluating LLM Agents, From Vibes to Regression Suites

Cost Control for LLM Agents, Token Budgets and Anthropic Prompt Caching

Guardrails for LLM Agents in 2024, Llama Guard, Rebuff, and NeMo

Memory for AI Agents, Short Term, Long Term, and What to Store Where

ReAct, Reflexion, and Planner Executor, Agent Loop Patterns That Work

Multi Agent Conversations with AutoGen, Patterns and Pitfalls

Designing Tools for LLM Agents, Function Schemas That Survive Production

Let’s Start a Project