Role Based Agent Teams with CrewAI, A Production Walkthrough

March 10, 2025 · 10 min read · by Muhammad Amal programming

TL;DR — CrewAI 0.86 is opinionated and that’s a feature. Use it when your problem decomposes into stable role descriptions and the work mostly flows linearly. Skip it when you need fine-grained graph control, that’s LangGraph’s job.

CrewAI is the framework I reach for when the team I’m working with has more product engineers than ML engineers. The API reads like a description of an actual team, you write a Researcher agent with a backstory, a Writer agent with a goal, and a couple of Tasks that flow between them. A junior dev can read a CrewAI script and understand what it does on the first pass. That’s worth a lot.

But I’ve also rebuilt two CrewAI prototypes in LangGraph because the team outgrew the abstractions. CrewAI 0.86 closed a lot of that gap. The hierarchical process is stable, the memory backends are pluggable, custom tools no longer require a decorator dance, and the new flow API gives you graph-like control for the cases that need it. This post is the walkthrough I’d give a team picking it up in 2025.

I’ll build a content research crew, three agents working on a real task, with custom tools, persistent memory, and observability wired in. Code targets Python 3.12 and crewai==0.86.0.

1. Install and project layout

CrewAI ships a CLI that scaffolds a project. I use it for new work because it standardizes the layout across teams.

python3.12 -m venv .venv && source .venv/bin/activate
pip install "crewai==0.86.0" "crewai-tools==0.17.0" "openai>=1.59" "tavily-python==0.5.0"
crewai create crew research_crew
cd research_crew

The scaffold gives you src/research_crew/crew.py plus YAML config files for agents and tasks. The YAML-first approach is one of the things CrewAI gets right, your prompts live as configuration, not as inline strings.

src/research_crew/
  config/
    agents.yaml
    tasks.yaml
  crew.py
  main.py
  tools/
    custom_tool.py

Set your env vars in .env.

echo "OPENAI_API_KEY=sk-..." >> .env
echo "TAVILY_API_KEY=tvly-..." >> .env
echo "OPENAI_MODEL_NAME=gpt-4o" >> .env

2. Defining the crew in YAML

This is where CrewAI’s opinions shine. Each agent has a role, goal, and backstory. Each task has a description, expected_output, and an agent assignment. The framework injects these into the system prompt at runtime.

# config/agents.yaml
researcher:
  role: >
    Senior Research Analyst for {topic}
  goal: >
    Find current, accurate information on {topic} and produce a structured
    fact sheet with sources and dates.
  backstory: >
    You're a senior analyst who's been burned by hallucinated stats. You
    cross-check every claim and you cite sources. You refuse to make up
    numbers you can't verify.

writer:
  role: >
    Technical writer
  goal: >
    Turn the research fact sheet into a {word_count}-word article that
    reads like it was written by a senior engineer.
  backstory: >
    You write like a senior engineer, contractions, no fluff, no hype
    language. You assume the reader is technical.

editor:
  role: >
    Editor and fact-checker
  goal: >
    Verify every claim in the draft against the fact sheet, fix style
    issues, and return the polished article.
  backstory: >
    You're a meticulous editor. You catch claims that drift from sources
    and you cut filler. You don't add new claims.

# config/tasks.yaml
research_task:
  description: >
    Research the topic {topic}. Produce a fact sheet with at least 8 facts,
    each cited with source URL and publication date. Do not include facts
    older than 18 months unless they're foundational.
  expected_output: >
    A markdown fact sheet with 8+ items in the format:
    - [fact]. Source, [URL] (YYYY-MM-DD).
  agent: researcher

writing_task:
  description: >
    Using the research fact sheet, write a {word_count}-word article on
    {topic}. The audience is senior engineers.
  expected_output: >
    A complete markdown article, no front matter, no title heading.
  agent: writer
  context: [research_task]

editing_task:
  description: >
    Verify each claim in the draft against the fact sheet. Fix style.
    Return the final polished article.
  expected_output: >
    Final article in markdown.
  agent: editor
  context: [research_task, writing_task]

The context field is critical. It tells CrewAI which prior task outputs to include as context for this task. Without it, the writer sees only the prompt, not the researcher’s output. The framework handles the wiring.

3. Wiring the crew with custom tools

Now the Python side. The @CrewBase decorator binds the YAML to a class.

# src/research_crew/crew.py
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import SerperDevTool, WebsiteSearchTool
from .tools.fact_checker import FactCheckerTool

@CrewBase
class ResearchCrew:
    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"

    @agent
    def researcher(self) -> Agent:
        return Agent(
            config=self.agents_config["researcher"],
            tools=[SerperDevTool(), WebsiteSearchTool()],
            verbose=True,
            max_iter=10,
            allow_delegation=False,
        )

    @agent
    def writer(self) -> Agent:
        return Agent(
            config=self.agents_config["writer"],
            verbose=True,
            max_iter=3,
            allow_delegation=False,
        )

    @agent
    def editor(self) -> Agent:
        return Agent(
            config=self.agents_config["editor"],
            tools=[FactCheckerTool()],
            verbose=True,
            max_iter=5,
            allow_delegation=False,
        )

    @task
    def research_task(self) -> Task:
        return Task(config=self.tasks_config["research_task"])

    @task
    def writing_task(self) -> Task:
        return Task(config=self.tasks_config["writing_task"])

    @task
    def editing_task(self) -> Task:
        return Task(config=self.tasks_config["editing_task"])

    @crew
    def crew(self) -> Crew:
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            memory=True,
            verbose=True,
        )

The allow_delegation=False is a habit you should pick up. CrewAI agents can delegate to each other by default, and that delegation is a tool call the LLM decides to make. It’s chaotic in practice. Turn it off unless you specifically need it.

Writing a custom tool

CrewAI 0.86 ships a clean BaseTool interface.

# src/research_crew/tools/fact_checker.py
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type

class FactCheckInput(BaseModel):
    claim: str = Field(..., description="The claim to fact-check.")
    sources: list[str] = Field(..., description="URLs cited for the claim.")

class FactCheckerTool(BaseTool):
    name: str = "fact_checker"
    description: str = (
        "Check whether a factual claim is supported by the given source URLs. "
        "Returns 'supported', 'contradicted', or 'unverifiable' with a reason."
    )
    args_schema: Type[BaseModel] = FactCheckInput

    def _run(self, claim: str, sources: list[str]) -> str:
        # In a real tool you'd fetch the URLs and do retrieval.
        # For demo, we return a stub.
        if not sources:
            return "unverifiable, no sources provided"
        return f"supported, found in {sources[0]}"

Two things matter here. The args_schema is what the LLM sees, so write the field descriptions like documentation, not like code comments. And the _run method should be fast and deterministic, slow tools tank your agent’s latency.

4. Running the crew

The main entry point.

# src/research_crew/main.py
from research_crew.crew import ResearchCrew

def run():
    inputs = {"topic": "Postgres 17 streaming I/O", "word_count": "800"}
    crew = ResearchCrew().crew()
    result = crew.kickoff(inputs=inputs)
    print("=" * 60)
    print(result.raw)
    print("=" * 60)
    print("token usage:", result.token_usage)

if __name__ == "__main__":
    run()

Run it.

python -m research_crew.main

You’ll see verbose output of every agent step, including tool calls and their results. The result.raw is the final task’s output, result.tasks_output has each individual task result if you need them.

5. Hierarchical process and the manager agent

Sequential is what you want most of the time. Hierarchical has a manager agent that decides which task runs next, and which agent gets it. It’s CrewAI’s version of the supervisor pattern.

from crewai import Crew, Process
from langchain_openai import ChatOpenAI

@crew
def crew(self) -> Crew:
    return Crew(
        agents=[self.researcher(), self.writer(), self.editor()],
        tasks=[self.research_task(), self.writing_task(), self.editing_task()],
        process=Process.hierarchical,
        manager_llm=ChatOpenAI(model="gpt-4o", temperature=0),
        verbose=True,
        memory=True,
    )

The manager_llm is what the framework uses for the manager agent’s reasoning. I set it to gpt-4o because the manager’s job is routing, not generation, and you want it deterministic. The worker agents can use cheaper models.

In practice I use hierarchical when I have more than four tasks and the dependencies between them aren’t strictly linear. For three-task pipelines, sequential is fine and saves you the manager’s token cost on every run.

6. Memory and the persistence story

CrewAI memory has three layers, short-term (conversation), long-term (cross-run), and entity (named things the crew has discussed). With memory=True, all three are on with defaults that use a local SQLite store.

For production, point them at a real store.

from crewai.memory import LongTermMemory, ShortTermMemory, EntityMemory
from crewai.memory.storage.rag_storage import RAGStorage
from crewai.memory.storage.ltm_sqlite_storage import LTMSQLiteStorage

@crew
def crew(self) -> Crew:
    return Crew(
        agents=self.agents,
        tasks=self.tasks,
        process=Process.sequential,
        memory=True,
        long_term_memory=LongTermMemory(
            storage=LTMSQLiteStorage(db_path="/var/lib/crewai/ltm.db")
        ),
        short_term_memory=ShortTermMemory(
            storage=RAGStorage(
                embedder_config={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
                type="short_term",
                path="/var/lib/crewai/stm"
            )
        ),
    )

The RAGStorage defaults to ChromaDB locally. For real deployments, override it with a Postgres pgvector backend, the interface is small enough that you can write one in an afternoon.

Observability hook

CrewAI 0.86 supports event callbacks. I wire mine into structured logging for Datadog or Phoenix.

from crewai.events import crewai_event_bus, TaskCompletedEvent, AgentExecutionCompletedEvent
import logging

log = logging.getLogger("crewai")

@crewai_event_bus.on(TaskCompletedEvent)
def on_task_complete(source, event: TaskCompletedEvent):
    log.info("task.complete", extra={
        "task": event.task.description[:80],
        "agent": event.task.agent.role,
        "duration_s": getattr(event, "duration", None),
    })

@crewai_event_bus.on(AgentExecutionCompletedEvent)
def on_agent_step(source, event):
    log.info("agent.step", extra={"agent": event.agent.role})

I cover wiring Phoenix and LangSmith in detail in observability for multi-agent systems.

7. Flows for graph-style control

CrewAI 0.86 introduced Flow, a declarative state machine sitting alongside Crews. Use it when you want graph-style branching without giving up the role-based agent abstractions.

from crewai.flow.flow import Flow, listen, start, router
from pydantic import BaseModel

class ContentState(BaseModel):
    topic: str = ""
    draft: str = ""
    approved: bool = False
    revisions: int = 0

class ContentFlow(Flow[ContentState]):
    @start()
    def kickoff(self):
        return ResearchCrew().crew().kickoff(inputs={"topic": self.state.topic})

    @listen(kickoff)
    def review(self, draft_output):
        self.state.draft = str(draft_output)
        # call a reviewer crew or a single LLM here

    @router(review)
    def decide(self):
        return "publish" if self.state.approved else "revise"

    @listen("revise")
    def revise(self):
        self.state.revisions += 1
        if self.state.revisions >= 3:
            return  # circuit breaker

    @listen("publish")
    def publish(self):
        # ship it
        pass

flow = ContentFlow()
flow.state.topic = "Postgres 17"
flow.kickoff()

Flows handle persistence too. With @persist() on methods, the state is checkpointed between steps, so a crash mid-flow can resume. It’s not LangGraph’s checkpointer in feature parity, but for CrewAI workloads it covers the durability story without leaving the framework.

Common Pitfalls

The ones I’ve actually paid for.

Leaving allow_delegation=True on every agent. You’ll get agents handing tasks to each other in surprising patterns and your token bill triples. Set it to False unless you’ve designed the system around delegation.
Using verbose=True in production. It prints full prompts and outputs, which is great for dev and terrible for a Kubernetes log volume. Switch to verbose=False and rely on event callbacks for structured telemetry.
Forgetting expected_output on tasks. Without it, the LLM doesn’t know what shape the result should take, and downstream tasks get junk context. Be explicit even if it feels redundant.
Mixing async and sync tools. CrewAI 0.86’s tool runtime handles both, but if you have an async tool that wraps a sync API badly, you’ll see deadlocks under load. Keep tools sync unless you genuinely need async I/O.

Troubleshooting

Three failures with concrete fixes.

Crew runs forever. Almost always max_iter is too high on a researcher agent and it keeps searching. Drop max_iter to 5 to 8, and add a tool-use guard in the agent’s backstory like “After 3 searches, write your conclusions even if you have more to investigate.”

Token usage climbs each iteration. Memory is pulling in too much context. Inspect what’s being injected by setting CREWAI_LOG_LEVEL=DEBUG and check the prompts. The usual fix is to scope memory tighter, use a per-topic memory key rather than a global one.

Tool calls fail with validation errors. Your Pydantic args_schema is too strict or the field descriptions are unclear. LLMs hallucinate arguments when descriptions are vague. Rewrite the descriptions as if you’re writing API docs, and use Optional only when truly optional.

Wrapping Up

CrewAI 0.86 hits a sweet spot for teams that want multi-agent without the cognitive overhead of a graph DSL. The YAML-first config keeps prompts versionable, the sequential process is the right default, and the new event bus gives you a hook for observability without monkey-patching.

I’d use CrewAI for content generation pipelines, research crews, and structured report writing. I’d reach for LangGraph or AutoGen 0.4 when I need fine-grained control over the message flow or when human-in-the-loop is more than a single approval step.

The CrewAI changelog is worth following, the 0.80 to 0.86 line tightened a lot of rough edges and there’s more coming. Pin your version in production, the API still moves fast enough that minor releases occasionally break tool signatures.

1. Install and project layout

2. Defining the crew in YAML

3. Wiring the crew with custom tools

Writing a custom tool

4. Running the crew

5. Hierarchical process and the manager agent

6. Memory and the persistence story

Observability hook

7. Flows for graph-style control

Common Pitfalls

Troubleshooting

Wrapping Up

Related posts

AutoGen 0.4 Deep Dive, What Changed and How to Use It

Production Multi Agent Systems with LangGraph 0.2, A Hands On Tutorial

Observability for Multi Agent Systems, LangSmith and Phoenix in 2025

Long Running Autonomous Agent Workflows, Checkpoints and Retries

Agent to Agent Communication Protocols, Choosing the Right One

Model Context Protocol Explained, Building MCP Servers in 2025

Multi Agent Systems in 2025, Architecture Patterns That Work

Multi Agent Conversations with AutoGen, Patterns and Pitfalls

Let’s Start a Project