AutoGen 0.4 Deep Dive, What Changed and How to Use It
TL;DR — AutoGen 0.4 is a ground-up rewrite around an async actor model. The old
ConversableAgentAPI is gone, replaced by a layeredautogen-coreandautogen-agentchat. The migration is worth it if you’re scaling, painful if you’re not.
If you used AutoGen 0.2 in 2024 and put it down, you’re in for a surprise. Microsoft Research shipped 0.4 between December 2024 and January 2025 as a complete rewrite, not an evolution. The mental model changed, the package layout changed, the runtime changed. If you upgrade by find-and-replace on imports, nothing will work.
I’ve migrated two AutoGen 0.2 projects to 0.4 in the last two months. One was straightforward because we were already running async and using groupchat sparingly. The other took a week because we had a custom GroupChatManager subclass doing things 0.4 expresses entirely differently. This post is the deep dive I wish I’d had at the start.
I’ll cover the new architecture, walk through a working 0.4 example with AssistantAgent and UserProxyAgent, show how Teams replaces GroupChat, and lay out the migration moves. Code is tested against autogen-agentchat==0.4.5 and autogen-ext==0.4.5 with Python 3.12.
What actually changed
The headline change is the actor model. AutoGen 0.4 treats every agent as an actor with a mailbox, addressed by an AgentId. Messages are typed Python objects, not dicts. The runtime handles delivery, lifecycle, and concurrency. You no longer call agent.send(msg, recipient) directly, you publish or send through the runtime.
The second change is layering. The framework is now three packages.
+--------------------------------+
| autogen-agentchat | high-level, AgentChat API
+--------------------------------+
| autogen-ext | models, tools, runtimes
+--------------------------------+
| autogen-core | actor model, message routing
+--------------------------------+
You can drop down to autogen-core when you need the actor primitives. For most apps, autogen-agentchat is what you’ll write against, and it sits on top of core.
The third change is async-first. Every agent method is a coroutine. There’s no sync facade. If your existing code mixes threads and AutoGen, you’re rewriting that boundary.
1. Install and a minimal working example
python3.12 -m venv .venv && source .venv/bin/activate
pip install \
"autogen-agentchat==0.4.5" \
"autogen-ext[openai]==0.4.5" \
"openai>=1.59"
The simplest 0.4 program, two agents alternating until one says TERMINATE.
# minimal.py
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination, MaxMessageTermination
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def main():
model = OpenAIChatCompletionClient(model="gpt-4o")
planner = AssistantAgent(
name="planner",
model_client=model,
system_message=(
"You break down problems. Output 3-step plans. "
"When you see 'DRAFT_READY', respond TERMINATE."
),
)
writer = AssistantAgent(
name="writer",
model_client=model,
system_message=(
"You write technical answers. End each message with DRAFT_READY."
),
)
termination = TextMentionTermination("TERMINATE") | MaxMessageTermination(8)
team = RoundRobinGroupChat([planner, writer], termination_condition=termination)
await Console(team.run_stream(task="Explain Postgres streaming I/O in 200 words."))
await model.close()
asyncio.run(main())
Run it.
python minimal.py
Console is a convenience streamer for terminal output. In production you’d consume team.run_stream() as an async iterator and route messages to your transport of choice.
A few things to notice. model_client is explicit, you build it and pass it in. No global config. system_message is on the agent constructor. RoundRobinGroupChat is one of several Team implementations. TerminationCondition is composable with | and & operators, which is the kind of small API choice that makes daily work pleasant.
2. Teams, the GroupChat replacement
AutoGen 0.4 ships three Team types.
from autogen_agentchat.teams import (
RoundRobinGroupChat, # cycles through agents
SelectorGroupChat, # LLM picks next speaker
Swarm, # agents hand off via tools
)
RoundRobinGroupChat is what it sounds like. Predictable, no model decisions about routing, cheap. Use it when the order is fixed.
SelectorGroupChat uses an LLM to pick the next speaker each turn. It’s the closest to 0.2’s GroupChatManager. The LLM gets the conversation and a list of agent descriptions and picks one.
from autogen_agentchat.teams import SelectorGroupChat
team = SelectorGroupChat(
participants=[researcher, writer, editor],
model_client=model,
termination_condition=termination,
selector_prompt=(
"Read the conversation and pick the next speaker from {participants}. "
"Pick researcher when facts are needed, writer for drafting, editor for review."
),
allow_repeated_speaker=False,
)
Swarm is the tool-based handoff pattern, conceptually identical to LangGraph swarms. Each agent can call a HandoffMessage tool to pass control. Useful when routing knowledge lives with the specialists, not with a manager.
from autogen_agentchat.teams import Swarm
from autogen_agentchat.agents import AssistantAgent
billing = AssistantAgent(
name="billing",
model_client=model,
handoffs=["tech"],
system_message="Handle billing. Hand off tech issues to 'tech'.",
)
tech = AssistantAgent(
name="tech",
model_client=model,
handoffs=["billing"],
system_message="Handle tech issues. Hand off billing to 'billing'.",
)
swarm = Swarm([billing, tech], termination_condition=termination)
The handoffs list compiles into tool definitions automatically. The agent decides when to call them.
3. Tools and the model client
Tools in 0.4 are Python callables wrapped via FunctionTool or passed directly to the agent.
from autogen_core.tools import FunctionTool
from autogen_agentchat.agents import AssistantAgent
async def lookup_invoice(invoice_id: str) -> dict:
"""Fetch invoice details by ID."""
# real impl would hit your billing API
return {"id": invoice_id, "amount_cents": 1999, "status": "paid"}
invoice_tool = FunctionTool(
lookup_invoice,
description="Look up an invoice by ID. Returns amount and status.",
)
billing = AssistantAgent(
name="billing",
model_client=model,
tools=[invoice_tool],
system_message="Help with billing questions. Use lookup_invoice when needed.",
)
The docstring becomes part of the tool description the LLM sees, but pass description explicitly if you want better control. Type hints become the parameter schema, so be precise with them.
Model clients
OpenAIChatCompletionClient is the obvious one. autogen-ext also ships clients for Anthropic, Azure OpenAI, Ollama, and a generic OpenAI-compatible endpoint.
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
from autogen_ext.models.openai import OpenAIChatCompletionClient
claude = AnthropicChatCompletionClient(
model="claude-3-7-sonnet-20250219",
max_tokens=4096,
)
gpt = OpenAIChatCompletionClient(
model="gpt-4o",
parallel_tool_calls=False,
)
I disable parallel_tool_calls for agents that need deterministic step-by-step reasoning. Parallel calls confuse traces and don’t speed things up for sequential workflows.
4. The core layer, when you need it
autogen-core is the actor runtime. You drop down to it when you need agents that communicate by typed message rather than by chat-style turns. This is the right layer for event-driven systems.
import asyncio
from dataclasses import dataclass
from autogen_core import (
AgentId, MessageContext, RoutedAgent,
SingleThreadedAgentRuntime, message_handler,
)
@dataclass
class ResearchRequest:
topic: str
@dataclass
class ResearchResult:
topic: str
facts: list[str]
class ResearcherAgent(RoutedAgent):
def __init__(self):
super().__init__("researcher agent")
@message_handler
async def handle(self, message: ResearchRequest, ctx: MessageContext) -> ResearchResult:
# call your model here
facts = [f"Fact about {message.topic}"]
return ResearchResult(topic=message.topic, facts=facts)
async def main():
runtime = SingleThreadedAgentRuntime()
await ResearcherAgent.register(runtime, "researcher", lambda: ResearcherAgent())
runtime.start()
result = await runtime.send_message(
ResearchRequest(topic="postgres"),
AgentId("researcher", "default"),
)
print(result)
await runtime.stop()
asyncio.run(main())
Typed messages, decorated handlers, an explicit runtime. This is what makes 0.4 scale. The SingleThreadedAgentRuntime is for single-process apps. There’s a WorkerAgentRuntime for distributed setups that talks to a host server, which is how you’d run agents across machines.
5. Observability and tracing
AutoGen 0.4 ships OpenTelemetry support out of the box. Every agent message, tool call, and model invocation emits spans. Wire it to any OTel-compatible backend, Phoenix and LangSmith both work.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(
OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
))
trace.set_tracer_provider(provider)
# now construct your runtime, agents, and team as normal
# spans flow to Phoenix or any OTel collector
The span semantics follow the OpenInference conventions, which means Phoenix and LangSmith both render them correctly without custom mapping. I covered the full observability story in observability for multi-agent systems.
For local debugging, the Console UI is good enough most of the time. For production, OTel into a real backend is the answer.
6. Migration moves from 0.2
The path I’ve walked twice now.
Imports
# 0.2
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
# 0.4
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import RoundRobinGroupChat, SelectorGroupChat
from autogen_ext.models.openai import OpenAIChatCompletionClient
Config dict goes away
# 0.2
agent = AssistantAgent(name="x", llm_config={"model": "gpt-4o", "api_key": ...})
# 0.4
model = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(name="x", model_client=model)
Tool registration changes
# 0.2
agent.register_for_execution()(my_func)
agent.register_for_llm(name="my_func", description="...")(my_func)
# 0.4
tool = FunctionTool(my_func, description="...")
agent = AssistantAgent(name="x", model_client=model, tools=[tool])
Termination is now declarative
# 0.2, you'd set max_consecutive_auto_reply on each agent
agent = AssistantAgent(name="x", max_consecutive_auto_reply=10)
# 0.4
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination
termination = MaxMessageTermination(20) | TextMentionTermination("TERMINATE")
team = RoundRobinGroupChat([a, b], termination_condition=termination)
Async everywhere
If your 0.2 code was sync, every call site for AutoGen needs to become async. That’s the biggest practical lift. Wrap your AutoGen entry points in asyncio.run() and propagate async upward through your call graph.
State persistence
The 0.2 API of pickling agent state for resumption is replaced by an explicit save_state and load_state on Teams and Agents. The shape is JSON-compatible.
state = await team.save_state() # dict
# persist to your store of choice, Postgres jsonb is what I use
await team.load_state(state)
For checkpointed durability across Python processes, you can pair this with the same Postgres patterns I used in the LangGraph tutorial, store the state blob keyed by thread ID, reload before resuming.
Common Pitfalls
I’ve eaten every one of these.
- Sharing a model client across processes.
OpenAIChatCompletionClientholds anhttpx.AsyncClient. It’s not pickleable. If you’re using multiprocessing, construct one per worker. Same for the Anthropic client. - Forgetting to close the model client.
await model.close()releases the underlying HTTP connection pool. Leaks accumulate fast under load tests if you skip this. - Putting blocking I/O in an agent handler. The runtime is single-threaded by default. A synchronous
requests.get()blocks the whole runtime. Usehttpx.AsyncClientor wrap withasyncio.to_thread. - Custom subclasses of the old
GroupChatManager. There’s no direct equivalent. You’re either using a Team type or you’re dropping toautogen-coreand writing a routing agent yourself. Don’t try to port the subclass class-for-class, redesign at the Team level.
Troubleshooting
Three real failures.
Team runs but agents don’t talk to each other. Usually a termination condition that fires too early, or a SelectorGroupChat that keeps picking the same agent. Print the selector’s reasoning by setting selector_prompt more explicitly, and check allow_repeated_speaker.
Tool calls return but the LLM ignores them. The tool returned a non-JSON-serializable object, and 0.4 silently dropped it from the message. Tool returns should be JSON-safe primitives, dicts, or Pydantic models. No file handles, no SDK objects.
Memory bloats when running long. RoundRobinGroupChat keeps the full message history by default. For long-running conversations, swap to a BufferedChatCompletionContext that trims to the last N messages, or implement a summarizing context window.
Wrapping Up
AutoGen 0.4 is the version Microsoft wanted to ship in 2024 and couldn’t until they rewrote it. The actor model under autogen-core, the clean Teams API, and the async-first runtime add up to something you can actually run in production at scale. The migration cost from 0.2 is real, but the destination is a meaningfully better framework.
I’d pick AutoGen 0.4 over LangGraph when I want typed message passing as the primitive rather than graph nodes, and over CrewAI when I need fine-grained control over speaker selection. For most teams, the three frameworks cover overlapping ground and the choice comes down to mental model fit.
If you’re choosing between AutoGen 0.4, LangGraph, and CrewAI, the multi-agent architecture patterns post walks through the topology decisions, which usually drive the framework choice more than feature checklists do.
The AutoGen 0.4 migration guide is the authoritative reference for the 0.2 to 0.4 jump. Read it before you start, not after. The sample apps under python/samples in the AutoGen repo are also better learning material than most of the docs, particularly the agentchat_chess and core_distributed_group_chat examples for understanding the layered architecture in practice.