Orchestrating Multi-Agent Workflows with CrewAI

Crewai article cover illustration on a gradient background

February 6, 2026 · 22 min read · by Muhammad Amal programming

TL;DR — A crew is a small team of AI agents, each playing one role, that pass work to each other in a defined order. Get three things right, and the team behaves: scope every agent narrowly, pick the right process for the job, and force the final task to return a structured object you can actually trust.

If you have ever organised a group project the night before it was due, you already understand the hardest part of multi-agent AI. The software is the easy part. The interesting question is the same one your teacher asks before the bell rings, which is, “who does what, in what order, and with what information?” CrewAI is a Python framework that turns that question into code. Once you see it that way, the framework stops feeling like magic and starts feeling like a checklist you fill in.

This article is a long, careful walk through CrewAI. We will start with a single agent saying hello, and we will end with a four-agent homework helper that searches the web, drafts an explanation, fact-checks itself against its own sources, and produces a clean structured report you can hand to any downstream program. Along the way I will explain every concept twice, once with a real-world analogy and once with the actual Python, so nothing has to be taken on faith. The framework is small enough that, by the end, every line of code in the final example should feel earned.

What Orchestration Actually Means

Picture a small orchestra. The violinist knows how to play the violin. The drummer knows how to drum. The trumpet player knows how to trumpet. None of them needs to be taught their instrument. What they need is a conductor who points to the right person at the right moment so the music comes out as a single piece, instead of five solos crashing into each other.

Orchestration in AI is the same idea, only the musicians are language models. A large language model, often abbreviated LLM, is genuinely good at certain kinds of work, such as reading, summarising, planning, and calling tools. It is far less good when you ask it to do all of those things at once in one giant question. The answer turns into mush, because the model is trying to please too many goals at the same time. Orchestration is the move where you break the work into smaller jobs, give each job to a focused worker, and let a coordinator decide who runs first. CrewAI is your coordinator’s baton.

The framework has been around since 2023, and by version 0.100 it has settled into a pleasantly small surface area. Four ideas do almost all of the work. Agents are the workers, and each has a role, a goal, and a backstory. Tasks are the units of work, and each has a description and a definition of “done.” Tools are the capabilities an agent can use, anything from web search to a calculator to a private API. Crews glue the agents and tasks together under a process, which decides who runs first. Memorise those four words. The rest of the tutorial is just learning how to combine them.

Setting Up Your Workshop

Before we write any AI code, we need a clean Python environment. If you have never used Python before, you can think of a virtual environment as a separate sandbox, so the libraries we install do not pollute the rest of your computer. You will need Python 3.12 or newer and an Anthropic API key, which is what lets your code talk to Claude. You can swap in OpenAI later, because CrewAI does not care which provider you use, but Anthropic is what I will assume here.

mkdir crewai-tutorial && cd crewai-tutorial
python3.12 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install "crewai==0.100.1" "crewai-tools==0.33.0" "pydantic==2.10.6" "httpx==0.27.2"
export ANTHROPIC_API_KEY="sk-ant-..."

A few notes for the curious. The crewai-tools package is separate, and it contains pre-built tools such as web search and file readers. The pydantic library lets us describe the shape of data, which we will use later to force agents to return clean JSON instead of mushy prose. The httpx package is a modern HTTP client we will use to fetch web pages.

Once the install finishes, create a file called llm.py. This is where we choose the brain.

# llm.py
from crewai import LLM

llm = LLM(
    model="anthropic/claude-sonnet-4-5-20250929",
    temperature=0.2,
)

The temperature setting is the creativity dial. A value of zero means “as deterministic as I can manage,” and a value of one means “go wild.” For most agent work you want it low, somewhere between 0.1 and 0.3, so that the agent makes the same choice when given the same input. Always set the model explicitly, by the way, and never rely on the framework’s default, because that default changes between minor releases and a quiet upgrade can change your output overnight.

Hello, Crew

The smallest possible CrewAI program is one agent and one task. We will build it first, so the moving parts become visible before we add anything to them.

# hello_crew.py
from crewai import Agent, Task, Crew, Process
from llm import llm

greeter = Agent(
    role="Friendly Greeter",
    goal="Welcome a new student to the world of multi-agent AI in two sentences",
    backstory=(
        "You are a warm, encouraging mentor who has helped many students "
        "write their first AI program. You never lecture; you greet."
    ),
    llm=llm,
    allow_delegation=False,
    verbose=True,
)

greet_task = Task(
    description="Write a friendly two-sentence welcome for {name}.",
    expected_output="Exactly two sentences, no more, no less.",
    agent=greeter,
)

crew = Crew(
    agents=[greeter],
    tasks=[greet_task],
    process=Process.sequential,
    verbose=True,
)

if __name__ == "__main__":
    result = crew.kickoff(inputs={"name": "Asha"})
    print("\n--- result ---")
    print(result.raw)

Run it with python hello_crew.py, and you should see the agent think out loud and then print a short welcome. Two things are worth pausing on.

The first is that the role, goal, and backstory are not decoration. CrewAI assembles those three fields into the system prompt that the LLM actually sees. If you write “Helper” for the role and “help the user” for the goal, you will get help-flavoured mush. If instead you write specific words, such as “Friendly Greeter,” “two sentences,” and “never lecture,” the model has something to grab onto. Treat these fields the way you would treat a careful job description for a contractor who will take every word literally.

The second is that {name} in the task description is a template variable. When you call kickoff(inputs={"name": "Asha"}), CrewAI substitutes the value in before the LLM ever sees the prompt. This is how you feed runtime data into a crew without rewriting code every time the input changes. Congratulations: you have just orchestrated one agent. Now we expand.

Two Agents That Talk to Each Other

A single agent is just an LLM call wearing a costume. The interesting work starts when two agents collaborate. Let us build a small research-and-explain pair, where one agent looks up facts and the other turns them into a short explanation.

The new piece you need to understand is context. By default, tasks do not see each other’s output. If you want the second task to read the first task’s answer, you have to say so out loud, with context=[first_task]. Forgetting this line is the single most common reason a new CrewAI user says, “my second agent ignored the research.”

# research_pair.py
from crewai import Agent, Task, Crew, Process
from llm import llm

researcher = Agent(
    role="Curious Researcher",
    goal="Find three trustworthy facts about the assigned topic",
    backstory=(
        "You are a careful student who never makes up a fact. "
        "If you are not sure, you say so."
    ),
    llm=llm,
    allow_delegation=False,
    verbose=True,
)

explainer = Agent(
    role="Plain-English Explainer",
    goal="Explain the researcher's findings to a curious 11th grader",
    backstory=(
        "You are a tutor who loves analogies. You never use a five-syllable "
        "word when a two-syllable word will do."
    ),
    llm=llm,
    allow_delegation=False,
    verbose=True,
)

research_task = Task(
    description="Find three trustworthy facts about {topic}.",
    expected_output="A bulleted list of three facts.",
    agent=researcher,
)

explain_task = Task(
    description=(
        "Using only the researcher's facts, write a short, friendly "
        "explanation of {topic} for a high school student."
    ),
    expected_output="One paragraph, four to six sentences.",
    agent=explainer,
    context=[research_task],
)

crew = Crew(
    agents=[researcher, explainer],
    tasks=[research_task, explain_task],
    process=Process.sequential,
    verbose=True,
)

if __name__ == "__main__":
    result = crew.kickoff(inputs={"topic": "how rainbows form"})
    print("\n--- result ---")
    print(result.raw)

Notice that two production-shaped habits have crept in already. The explainer’s task description literally says, in plain English, “using only the researcher’s facts.” That phrase dramatically reduces the chance the explainer invents something the researcher never said. LLMs are statistical text machines, not reasoning oracles, and a sentence like that nudges the statistical choice towards the right one. The process=Process.sequential setting tells CrewAI to run the tasks in list order, researcher first, explainer second. We will look at the other process in a moment.

Giving Agents Tools

A real researcher does not just guess; she opens a browser. CrewAI agents become genuinely useful when you give them tools. A tool is, at heart, a Python function with a clear name, a clear description, and clear inputs. The LLM reads those, decides when to call the tool, fills in the inputs, and feeds the result back into its own reasoning. The framework handles the plumbing.

Here is a minimal pair of tools, one that fetches a web page and returns its text, and one stub search tool you would replace with a real provider in production. The single most important rule, and the one that has saved me the most debugging time over the years, is that tools should return strings, including error strings, and they should never raise exceptions. An unhandled exception in a tool aborts the entire crew. An “ERROR: timed out” string, on the other hand, lets the agent read the failure and adapt.

# tools.py
import httpx
from crewai.tools import BaseTool
from pydantic import BaseModel, Field


class FetchUrlInput(BaseModel):
    url: str = Field(description="Absolute http(s) URL to fetch")


class FetchUrlTool(BaseTool):
    name: str = "fetch_url"
    description: str = "Fetch the text content of a web page. Returns plain text."
    args_schema: type[BaseModel] = FetchUrlInput

    def _run(self, url: str) -> str:
        if not url.startswith(("http://", "https://")):
            return "ERROR: url must be absolute and start with http(s)://"
        try:
            resp = httpx.get(url, timeout=15.0, follow_redirects=True)
            resp.raise_for_status()
        except httpx.TimeoutException:
            return f"ERROR: request to {url} timed out after 15 seconds"
        except httpx.HTTPStatusError as exc:
            return f"ERROR: {url} returned status {exc.response.status_code}"
        except httpx.HTTPError as exc:
            return f"ERROR: network failure fetching {url}: {exc}"
        return resp.text[:12000]


class WebSearchInput(BaseModel):
    query: str = Field(description="What you want to search for")


class WebSearchTool(BaseTool):
    name: str = "web_search"
    description: str = (
        "Search the web. Returns a numbered list of title and URL pairs. "
        "Use this before fetch_url when you do not already have a URL."
    )
    args_schema: type[BaseModel] = WebSearchInput

    def _run(self, query: str) -> str:
        # In production, call Tavily, Brave, or SerpAPI here and format
        # the results into a numbered list. The stub below lets the
        # rest of the tutorial run without signing up for a search API.
        return (
            "1. NASA, Why are rainbows curved, https://science.nasa.gov/rainbow\n"
            "2. NOAA, Rainbow formation, https://www.noaa.gov/rainbow\n"
            "3. Royal Society, Optics of rainbows, https://royalsociety.org/rainbow"
        )


fetch_url_tool = FetchUrlTool()
web_search_tool = WebSearchTool()

The stub web search returns hard-coded results so you can run the tutorial without signing up for a search API. When you are ready, swap the body of _run for a real call to Tavily, Brave Search, or SerpAPI. The signature does not change, and the agent suddenly has a real internet. Now let us wire the tools into the researcher.

# agents.py
from crewai import Agent
from llm import llm
from tools import fetch_url_tool, web_search_tool

researcher = Agent(
    role="Curious Researcher",
    goal="Find three trustworthy facts about the assigned topic with a source URL for each",
    backstory=(
        "You are a careful student who never makes up a fact. "
        "You always search before fetching, and you cite every claim "
        "with a URL the user can verify."
    ),
    tools=[web_search_tool, fetch_url_tool],
    llm=llm,
    max_iter=8,
    allow_delegation=False,
    verbose=True,
)

The new setting, max_iter=8, caps how many times the agent is allowed to loop between thinking and using a tool before it must produce a final answer. Without it, a flaky tool can make the agent search forever, burning your API budget while you stare at the terminal. Eight is a comfortable default for research; lower it for simpler agents.

Forcing Structured Output With Pydantic

Up to this point, our agents return prose. Prose is fine for humans, but it is terrible for software. If you want to take an agent’s output and feed it into another program, such as a website, a database, or a Discord bot, you need predictable structure. CrewAI lets you pin a task’s output to a Pydantic model, and the framework will validate the result for you. If the model returns something that does not fit, CrewAI raises an error you can catch instead of silently shipping garbage.

# tasks.py
from crewai import Task
from pydantic import BaseModel, Field
from agents import researcher, explainer


class Source(BaseModel):
    title: str
    url: str


class ResearchBrief(BaseModel):
    topic: str
    summary: str = Field(description="Three to five sentence overview")
    key_findings: list[str] = Field(min_length=3, max_length=7)
    sources: list[Source] = Field(min_length=1)


research_task = Task(
    description=(
        "Research the topic: {topic}. Use web_search to find sources, "
        "then fetch_url to read them. Collect at least three concrete "
        "findings and the URL of every source you used."
    ),
    expected_output="A bulleted list of findings, each with its source URL.",
    agent=researcher,
)

brief_task = Task(
    description=(
        "Using only the researcher's findings, produce a structured brief "
        "on {topic}. Do not invent facts the researcher did not provide. "
        "Carry through every source URL."
    ),
    expected_output="A ResearchBrief object.",
    agent=explainer,
    context=[research_task],
    output_pydantic=ResearchBrief,
)

When this task finishes, you can access the validated object directly on the result.

# run_brief.py
from crewai import Crew, Process
from agents import researcher, explainer
from tasks import research_task, brief_task, ResearchBrief

crew = Crew(
    agents=[researcher, explainer],
    tasks=[research_task, brief_task],
    process=Process.sequential,
    verbose=True,
    max_rpm=20,
)

result = crew.kickoff(inputs={"topic": "how rainbows form"})
brief: ResearchBrief = result.pydantic
print(brief.summary)
for finding in brief.key_findings:
    print(f"- {finding}")
for src in brief.sources:
    print(f"({src.title}) {src.url}")

The result.pydantic attribute holds the validated object, and result.raw holds the plain string. You will reach for .pydantic ninety percent of the time once you have a schema. A subtle but important detail in the crew above is max_rpm=20, which caps how many LLM calls per minute the crew can make. Your API provider has rate limits, and a busy crew can trip them mid-run. Setting this below your provider’s ceiling makes the crew throttle itself instead of failing in the middle of a task.

Sequential Versus Hierarchical, And Why It Matters

CrewAI offers two processes. They feel similar at first, but they could not be more different in practice.

Sequential runs the tasks in the order you listed them, and each task’s output flows into the next via the context parameter. This is the right default. It is cheap, predictable, and easy to debug. If your workflow is “do A, then B, then C,” use sequential.

Hierarchical introduces an extra agent called the manager. You do not write the manager; CrewAI generates it for you. Its job is to look at the tasks, decide which worker should handle each one, and re-delegate if a task needs to be re-run. This is powerful when the workflow is genuinely open-ended, and the right routing depends on intermediate results in a way you cannot predict at design time. It is also more expensive, slower, and noticeably harder to debug, because every task now goes through an extra planning layer.

# crew_hierarchical.py
from crewai import Crew, Process
from llm import llm
from agents import researcher, explainer
from tasks import research_task, brief_task

hier_crew = Crew(
    agents=[researcher, explainer],
    tasks=[research_task, brief_task],
    process=Process.hierarchical,
    manager_llm=llm,
    verbose=True,
    max_rpm=20,
)

My honest advice, as someone who has shipped both styles to production, is to start sequential and stay sequential until you have a concrete reason to leave. Most crews never need hierarchical, and the ones that do tend to announce themselves clearly: the tasks really do depend on each other in ways you cannot predict, and the manager’s planning earns its tokens. The CrewAI processes documentation lays out the trade-offs in more detail if you want the full picture.

Memory, Or Why Your Crew Forgets

By default, every call to kickoff is a clean slate. The crew remembers nothing from the previous run. For most tutorials and most production tasks, that is exactly what you want, because it keeps the system reproducible and easy to test. But sometimes you really do want your crew to remember. A tutoring crew that remembers what the student has already learned is meaningfully more useful than one that re-explains the same thing every session. CrewAI ships three memory layers, all hidden behind a single switch.

crew = Crew(
    agents=[researcher, explainer],
    tasks=[research_task, brief_task],
    process=Process.sequential,
    memory=True,
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
    verbose=True,
)

Turning memory=True on enables three layers at once. Short-term memory keeps the last few turns of the current run in mind, similar to the lines you remember from the conversation you are currently having. Long-term memory survives across runs and is stored on disk, so previous crews can inform future ones. Entity memory tracks specific named things, such as a student called Asha or a topic called rainbows, so the crew can pick up where it left off. The embedder setting tells CrewAI how to turn text into numbers it can compare, and the default uses OpenAI. Memory is one of those features that looks free in the documentation and quietly costs you embedding API calls in practice, so turn it on only when you have a clear reason to.

A Real Project, The Homework Helper Crew

Now we bring everything together. We will build a four-agent crew that, given a homework topic, researches it, drafts an explanation, fact-checks the draft against the original sources, and finally polishes the text into a clean structured answer. This is a realistic shape for many production multi-agent systems, the research, write, verify, polish loop, and it is small enough to walk through end to end.

The Four Roles

# crew_homework.py
from crewai import Agent, Task, Crew, Process
from pydantic import BaseModel, Field
from llm import llm
from tools import web_search_tool, fetch_url_tool


class Source(BaseModel):
    title: str
    url: str


class HomeworkAnswer(BaseModel):
    topic: str
    grade_level: str
    explanation: str = Field(description="Six to ten sentence answer")
    key_terms: list[str] = Field(min_length=2, max_length=6)
    sources: list[Source] = Field(min_length=1)
    fact_check_notes: str


researcher = Agent(
    role="Homework Researcher",
    goal="Gather three to five trustworthy facts about the topic with sources",
    backstory=(
        "You are a methodical researcher. You search first, fetch second, "
        "and quote URLs exactly as they appear. You never paraphrase a URL."
    ),
    tools=[web_search_tool, fetch_url_tool],
    llm=llm,
    max_iter=8,
    allow_delegation=False,
    verbose=True,
)

writer = Agent(
    role="Homework Writer",
    goal="Turn the researcher's facts into a friendly explanation",
    backstory=(
        "You write for {grade_level} students. You avoid jargon, and "
        "you use one short analogy per paragraph."
    ),
    llm=llm,
    max_iter=4,
    allow_delegation=False,
    verbose=True,
)

fact_checker = Agent(
    role="Fact Checker",
    goal="Compare the writer's draft against the researcher's findings",
    backstory=(
        "You are skeptical. You flag any sentence in the draft that "
        "does not trace to a source the researcher provided."
    ),
    llm=llm,
    max_iter=3,
    allow_delegation=False,
    verbose=True,
)

editor = Agent(
    role="Friendly Editor",
    goal="Polish the draft and assemble the final structured answer",
    backstory=(
        "You smooth rough edges without changing meaning. You preserve "
        "every source URL the researcher provided."
    ),
    llm=llm,
    max_iter=3,
    allow_delegation=False,
    verbose=True,
)

Four agents, four roles, four backstories. Read the role-goal-backstory triples out loud, because the LLM essentially does the same thing internally. Each one is a tiny job description with enough specificity that the model has something concrete to grab onto.

The Four Tasks

research_task = Task(
    description=(
        "Research the topic: {topic}. Use web_search to find sources, and "
        "fetch_url to read the most promising ones. Produce a bulleted "
        "list of three to five concrete findings, each with its source URL."
    ),
    expected_output="Bulleted findings with URLs.",
    agent=researcher,
)

draft_task = Task(
    description=(
        "Write a homework answer for a {grade_level} student about {topic}, "
        "using only the researcher's findings. Aim for six to ten sentences. "
        "Include one short analogy. Carry through every source URL."
    ),
    expected_output="A draft explanation followed by the source URLs.",
    agent=writer,
    context=[research_task],
)

fact_check_task = Task(
    description=(
        "Read the draft and list every claim that does not trace back to "
        "the researcher's findings. If everything checks out, say so."
    ),
    expected_output="A bulleted list of unsupported claims, or 'all clear'.",
    agent=fact_checker,
    context=[research_task, draft_task],
)

polish_task = Task(
    description=(
        "Combine the draft and the fact-check notes into a final "
        "HomeworkAnswer for a {grade_level} student about {topic}. "
        "Remove or revise any claim the fact-checker flagged."
    ),
    expected_output="A HomeworkAnswer object.",
    agent=editor,
    context=[research_task, draft_task, fact_check_task],
    output_pydantic=HomeworkAnswer,
)

Pay attention to the context chain, because it is doing a lot of quiet work. The draft sees the research. The fact-check sees both the research and the draft. The polish sees all three. That is how information flows through the crew, explicitly, by name, on wires you drew yourself. There is no magic global memory; if a task needs an earlier output, you have to list it.

The Crew

homework_crew = Crew(
    agents=[researcher, writer, fact_checker, editor],
    tasks=[research_task, draft_task, fact_check_task, polish_task],
    process=Process.sequential,
    verbose=True,
    max_rpm=20,
)


if __name__ == "__main__":
    result = homework_crew.kickoff(inputs={
        "topic": "how rainbows form",
        "grade_level": "11th grade",
    })
    answer: HomeworkAnswer = result.pydantic
    print(f"\n--- {answer.topic} ({answer.grade_level}) ---")
    print(answer.explanation)
    print("\nKey terms:", ", ".join(answer.key_terms))
    for src in answer.sources:
        print(f"({src.title}) {src.url}")
    print(f"\nFact-check: {answer.fact_check_notes}")

    usage = result.token_usage
    print(f"\ntokens used: {usage.total_tokens}")

Run it, and the terminal will scroll a long way as four agents take their turns. When it finishes, you have a structured, fact-checked homework answer. More importantly, you have a template for any “research, write, verify, polish” workflow, which covers a surprising amount of real-world AI work.

Watching What Your Crew Does

When something goes wrong, and it will, because that is normal, the first place to look is the verbose log. Setting verbose=True on the crew and the agents prints every thought, every tool call, and every result. Read it like a transcript, because the mistakes are usually obvious in hindsight, especially once you know what to look for.

For more permanent instrumentation, two habits help a lot. The first is logging the token usage at the end of every run, which we already do with result.token_usage. Multi-agent crews multiply token cost, because every agent, every tool loop, and every manager decision is another LLM call. Knowing the per-run cost from day one is the easiest way to avoid an unpleasant surprise on your monthly bill.

The second habit is attaching task callbacks so you can log or persist the output of each step the moment it finishes. This is invaluable when a crew takes a long time to run and you want to know which step is slow, or when you want to keep a transcript of every intermediate artifact for review later.

def on_task_complete(output) -> None:
    print(f"[done] {output.task.description[:40]}... -> {len(str(output.raw))} chars")

draft_task.callback = on_task_complete

Attach a callback to any task, and CrewAI invokes it after the task finishes. You can write the output to a file, ship it to a database, or, most commonly, just log enough to understand what happened. None of this is glamorous, and all of it pays for itself the first time something breaks in production.

Common Mistakes And How To Fix Them

I have made every one of these mistakes myself, and so has every engineer I know who builds with CrewAI for any length of time.

The first is thin agent definitions. A role of “Assistant” and a goal of “help” produces generic output, because that is exactly what those words ask for. The role, goal, and backstory triple is your prompt, so invest in it.

The second is forgetting context between tasks. Tasks do not see each other’s output unless you say so. If the next task seems to ignore the previous one, the wire is missing, and adding context=[previous_task] will almost always fix it.

The third is tools that raise exceptions. Any uncaught exception in a tool’s _run aborts the entire crew. Wrap your tool body in try/except, return error strings, and let the agent adapt to failures the way a human would.

The fourth is unbounded max_iter. A flaky tool combined with a high iteration cap equals a tokens-on-fire situation, because the agent will keep retrying. Set max_iter to the smallest value that lets the agent finish honest work, and not a step higher.

The fifth is choosing hierarchical too early. The manager agent roughly doubles the coordination cost, because every task now goes through an extra planning step. If your workflow is a known pipeline, and most workflows are, sequential is correct and cheaper.

The sixth is no max_rpm. Without a rate cap, a busy crew trips your provider’s 429 rate-limit errors mid-run and fails tasks unpredictably. Set it below your provider’s ceiling, and the crew throttles itself politely.

The seventh, and easiest to avoid, is skipping output_pydantic on the final task. Without a schema, the last task returns prose, and your downstream code has to parse it. With a schema, you get a validated object you can build software on. The few extra lines pay for themselves the first time something downstream depends on the result.

Where To Go From Here

If you have followed this article end to end, you have built something genuinely non-trivial. You have agents with tools, structured output, a four-step pipeline, and instrumentation. The next moves all trade a little extra complexity for a lot of extra capability.

Swap the stub WebSearchTool for a real provider such as Tavily or Brave, and the crew can actually browse the live web instead of the rehearsed list we used. Add a verification task that visits each source URL and confirms the claim is still there, which is the single highest-leverage upgrade for any research crew. Turn on memory=True and watch the crew remember students across sessions. Wire in a LangSmith or Phoenix tracing integration so you can see every step in a visual timeline instead of scrolling logs.

When you eventually outgrow CrewAI’s role-oriented model, and you will, the day you need to express a workflow with branching, conditionals, and explicit loops, graduate to LangGraph for fine-grained graph control . The mental model you built here transfers cleanly, because agents are still agents, tasks are still tasks, and the wiring between them is just drawn differently.

Multi-agent orchestration is not magic. It is a small number of decisions, made deliberately, and repeated until the system behaves the way you want it to. You now know which decisions matter and which knobs control them. The rest is practice, and the only way to get that practice is to run the homework crew, change one thing, and see what happens. Then change one more thing. After about a dozen of those small experiments, the framework stops feeling like a library and starts feeling like a tool you reach for without thinking.

What Orchestration Actually Means

Setting Up Your Workshop

Hello, Crew

Two Agents That Talk to Each Other

Giving Agents Tools

Forcing Structured Output With Pydantic

Sequential Versus Hierarchical, And Why It Matters

Memory, Or Why Your Crew Forgets

A Real Project, The Homework Helper Crew

The Four Roles

The Four Tasks

The Crew

Watching What Your Crew Does

Common Mistakes And How To Fix Them

Where To Go From Here

Related posts

Building an Autonomous Engineering Squad with LangGraph

Role Based Agent Teams with CrewAI, A Production Walkthrough

Instrumenting LLM Calls with OpenTelemetry Traces

Stateful Agent Graphs, Checkpointing and Human in the Loop

AutoGen 0.4 Deep Dive, What Changed and How to Use It

Production Multi Agent Systems with LangGraph 0.2, A Hands On Tutorial

Evaluating LLM Agents, From Vibes to Regression Suites

Cost Control for LLM Agents, Token Budgets and Anthropic Prompt Caching

Let’s Start a Project