Model Context Protocol Explained, Building MCP Servers in 2025

Model Context Protocol Explained, Building MCP Servers in 2025

March 17, 2025 · 9 min read · by Muhammad Amal programming

TL;DR — MCP is the standard for letting LLMs talk to your tools without per-vendor SDK lock-in. A real MCP server exposes tools, resources, and prompts over stdio or HTTP. Here’s how to build one and wire it into Claude and your agents.

Anthropic released the Model Context Protocol in late November 2024 and the ecosystem moved fast. By March 2025, there are SDKs in Python, TypeScript, Java, Kotlin, and C#, and Claude Desktop, Cursor, Zed, Continue, and a dozen other clients can speak it. If you’re building agent tooling and you’re still writing custom function-calling adapters per framework, you’re behind.

The pitch is straightforward. MCP defines a JSON-RPC protocol between an LLM client and a server that exposes capabilities. The server says “I have these tools, these resources, these prompts.” The client decides what to use. Everyone speaks the same wire format. You write a server once, every MCP-aware client can use it.

I’ve built three MCP servers this quarter, two for internal tooling and one for a client product. This post is the working knowledge that took me a couple of weeks to accumulate, distilled to the parts you need. Code targets Python 3.12 with mcp==1.2.0 and anthropic==0.42.0.

1. The MCP mental model

Three concepts, in order of how often you’ll use them.

Tools are functions the LLM can call. Same idea as OpenAI function calling. Each tool has a name, description, JSON schema for arguments, and a handler. The model decides when to call them.

Resources are URI-addressable data the LLM can read. Think of them as a flat namespace of read-only files. file:///tasks.md, db://invoices/1234. The client typically picks which resources to attach to a conversation, not the model.

Prompts are pre-canned templates the user can invoke. They show up as slash commands or pickers in MCP clients. Useful for “summarize this thread” style affordances that you want consistent across users.

The transport is JSON-RPC 2.0 over either stdio (for local servers) or HTTP with Server-Sent Events (for remote). Stdio is what Claude Desktop uses, you launch the server as a subprocess. HTTP is what you’d use for a hosted MCP server multiple clients connect to.

+-------------+   JSON-RPC    +-------------+
| MCP client  | <-----------> | MCP server  |
| (Claude,    |   stdio or    | (your code) |
|  Cursor...) |   HTTP+SSE    |             |
+-------------+               +-------------+

2. A minimal Python MCP server

Install and scaffold.

python3.12 -m venv .venv && source .venv/bin/activate
pip install "mcp==1.2.0" "httpx>=0.27" "anthropic==0.42.0"

The simplest server, one tool that fetches GitHub user info.

# server.py
import asyncio
import httpx
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("github-info")

@mcp.tool()
async def get_user(username: str) -> dict:
    """Look up a public GitHub user profile.

    Args:
        username: GitHub username (e.g., 'torvalds').
    Returns:
        A dict with name, bio, public_repos, followers.
    """
    async with httpx.AsyncClient(timeout=10.0) as client:
        resp = await client.get(
            f"https://api.github.com/users/{username}",
            headers={"Accept": "application/vnd.github.v3+json"},
        )
        resp.raise_for_status()
        data = resp.json()
        return {
            "name": data.get("name"),
            "bio": data.get("bio"),
            "public_repos": data["public_repos"],
            "followers": data["followers"],
        }

if __name__ == "__main__":
    mcp.run(transport="stdio")

FastMCP is the high-level Python wrapper that handles JSON-RPC plumbing. The @mcp.tool() decorator inspects the function signature, builds the JSON schema from type hints, and registers the tool. The docstring becomes the description the model sees, so write it like documentation, not like code comments.

Run it standalone to confirm it parses.

python server.py

It’ll block waiting for stdio input, that’s expected. Kill it with Ctrl-C.

3. Wiring into Claude Desktop

Claude Desktop reads server configs from ~/Library/Application Support/Claude/claude_desktop_config.json on macOS. Add your server.

{
  "mcpServers": {
    "github-info": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["/absolute/path/to/server.py"],
      "env": {}
    }
  }
}

Restart Claude Desktop. Open a new conversation, click the tools indicator, and you should see github-info listed with its get_user tool. Ask “What does the GitHub user torvalds have in their public profile?” and it’ll call your server.

The same config format works for Cursor and Zed, with paths adjusted. Stdio means each client launches its own subprocess, so the server doesn’t need to be running ahead of time.

4. Resources, structured data the LLM can read

Tools are for actions. Resources are for read-only context. The conceptual model is “files the client can attach to a conversation.”

@mcp.resource("schema://tables")
async def list_tables() -> str:
    """List database tables available for queries."""
    return "users, invoices, subscriptions, events"

@mcp.resource("schema://tables/{table_name}")
async def describe_table(table_name: str) -> str:
    """Return CREATE TABLE statement for the given table."""
    schemas = {
        "users": "CREATE TABLE users (id BIGINT PK, email TEXT, created_at TIMESTAMPTZ);",
        "invoices": "CREATE TABLE invoices (id BIGINT PK, user_id BIGINT FK, amount_cents INT);",
    }
    return schemas.get(table_name, f"-- no schema known for {table_name}")

The URI templates use {var} syntax, MCP parses them and dispatches to the handler. The client UI typically shows resources as a tree the user can browse and attach.

The pattern I use, expose anything that’s expensive to fetch or that benefits from caching as a resource rather than a tool. The client can decide once whether to include it, instead of the model deciding on every turn whether to call a tool.

5. Prompts, slash commands for users

Prompts are templated chat starters. They show up as picker options in compatible clients.

from mcp.server.fastmcp.prompts import base

@mcp.prompt()
def review_pr(pr_url: str, focus: str = "bugs") -> list[base.Message]:
    """Review a pull request URL with a specific focus."""
    return [
        base.UserMessage(
            f"Review the pull request at {pr_url}. "
            f"Focus on {focus}. Cite specific lines."
        )
    ]

In Claude Desktop, this shows up as a /review_pr slash command with parameter fields. It’s a clean way to ship reusable prompt scaffolds to your team without each person copy-pasting.

6. Connecting from a Python agent

Most of you will also want to consume an MCP server from your own agent code. The mcp SDK ships a client.

# client.py
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from anthropic import Anthropic

async def main():
    params = StdioServerParameters(
        command="/abs/path/to/.venv/bin/python",
        args=["/abs/path/to/server.py"],
    )

    anthropic = Anthropic()

    async with stdio_client(params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            tool_list = await session.list_tools()
            tools = [{
                "name": t.name,
                "description": t.description,
                "input_schema": t.inputSchema,
            } for t in tool_list.tools]

            messages = [{"role": "user", "content": "Get the GitHub profile for torvalds."}]
            while True:
                resp = anthropic.messages.create(
                    model="claude-3-7-sonnet-20250219",
                    max_tokens=1024,
                    tools=tools,
                    messages=messages,
                )

                if resp.stop_reason != "tool_use":
                    print(resp.content[0].text)
                    break

                messages.append({"role": "assistant", "content": resp.content})
                tool_results = []
                for block in resp.content:
                    if block.type == "tool_use":
                        result = await session.call_tool(block.name, block.input)
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": str(result.content),
                        })
                messages.append({"role": "user", "content": tool_results})

asyncio.run(main())

That’s a full agent loop with MCP tools, using Anthropic’s Claude 3.7 Sonnet. The same pattern works with the OpenAI SDK 1.59+, just translate tool-use blocks to OpenAI’s function-call shape. If you’re wondering how this slots into a multi-agent architecture, I covered the topology choices in multi-agent architecture patterns.

7. HTTP transport for remote servers

Stdio is great for local, awkward for production. For a server that multiple clients reach, use HTTP+SSE.

# http_server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("github-info")

@mcp.tool()
async def get_user(username: str) -> dict:
    # same as before
    ...

if __name__ == "__main__":
    mcp.run(transport="sse", host="0.0.0.0", port=8765)

The SSE endpoint becomes http://host:8765/sse. Clients that support remote MCP servers point at it and connect over HTTP. Add a reverse proxy with TLS termination and auth on top, MCP itself doesn’t define auth at the wire level, that’s intentionally left to the transport.

For auth I use bearer tokens in the Authorization header and verify in middleware before the SSE handshake. Anthropic’s MCP spec page is the source of truth on what the protocol guarantees versus what’s transport-level.

8. Capability negotiation and progress notifications

Two parts of the spec worth knowing about even if you don’t use them on day one.

Capability negotiation is how the client and server agree on what’s supported. On initialize, the client sends its supported capabilities, the server returns its own. If your server supports streaming progress notifications for long tools, advertise it.

from mcp.server.fastmcp import FastMCP, Context

mcp = FastMCP("long-running")

@mcp.tool()
async def deep_research(query: str, ctx: Context) -> str:
    """Run a long research task and stream progress."""
    for i, step in enumerate(["search", "filter", "summarize", "verify"]):
        await ctx.report_progress(i + 1, 4, message=f"{step} step")
        # do the actual work
    return "research complete"

The Context parameter is auto-injected when its type appears in the signature. Calling report_progress sends a notification the client can render as a progress bar. Critical for tools that take more than a couple seconds.

Sampling lets the server ask the client to run an LLM call. Useful for agentic MCP servers that need their own reasoning step but want to bill against the client’s model usage instead of the server’s. Not all clients support it, check capabilities first.

Common Pitfalls

What’s bitten me on real servers.

Side effects in tool docstrings. The docstring is the description sent to the LLM. If you write “Deletes the user” without specifying scope, the model will call it freely. Be precise, include “Only call when user has confirmed deletion in chat.”
Returning huge blobs from tools. The full result lands in the LLM’s context. A 50k-token tool response eats your context budget. Paginate, summarize, or use resources for bulk data.
Mixing sync and async wrong. FastMCP tool handlers can be async def or def. If you use sync handlers and call blocking I/O, you stall the server. Default to async with httpx and asyncpg.
Forgetting to validate arguments. The LLM will hallucinate argument shapes occasionally. Use Pydantic models for tool args, FastMCP picks them up from type hints and validates before invoking your handler.

Troubleshooting

Three failures with fixes.

Claude Desktop says “MCP server failed to start.” Logs are at ~/Library/Logs/Claude/mcp-server-NAME.log. Usual cause is a wrong Python path, missing dependency in the venv, or an import error. Tail the log while restarting Claude.

Tool calls hang and time out. Your handler is blocking. Wrap any sync I/O in asyncio.to_thread or rewrite with async libraries. Also check the client timeout, Anthropic’s SDK defaults to 600 seconds but Claude Desktop is shorter.

Schemas show up empty in the client. You omitted type hints on the tool function, or you used Any. FastMCP can’t infer a schema from those. Add proper types, even if it means defining a small dataclass.

Wrapping Up

MCP earned its hype. The protocol is simple enough that you can read the spec in an afternoon, the Python SDK is pleasant, and the network effect of every major client supporting it means you write the integration once. If you’re building agent tooling in 2025 and you’re not exposing it as MCP, you’re going to keep writing the same adapter four times.

The two patterns I’d recommend, expose your existing internal tooling as MCP servers for your team’s agent workflows, and design new tools as MCP-first from day one. The migration cost is small and the optionality is large.

Anthropic’s MCP documentation site has the full spec, all the SDKs, and a growing list of reference servers worth studying. The mcp-server-everything reference implementation is a particularly good read for understanding capability negotiation.

What’s next for the protocol, OAuth flows for remote servers are getting formalized in the next spec revision, and there’s active work on better streaming and partial result handling. The shape is stable enough now that you should build against the current spec without waiting. The breaking changes I’ve seen across the 1.0 to 1.2 line in the Python SDK have all been minor signature tweaks, nothing catastrophic. Pin your version, test on upgrades, ship.

1. The MCP mental model

2. A minimal Python MCP server

3. Wiring into Claude Desktop

4. Resources, structured data the LLM can read

5. Prompts, slash commands for users

6. Connecting from a Python agent

7. HTTP transport for remote servers

8. Capability negotiation and progress notifications

Common Pitfalls

Troubleshooting

Wrapping Up

Related posts

The 2025 Technical Retrospective, Agents, Wasm, Edge AI, and MCP

Observability for Multi Agent Systems, LangSmith and Phoenix in 2025

Long Running Autonomous Agent Workflows, Checkpoints and Retries

Agent to Agent Communication Protocols, Choosing the Right One

AutoGen 0.4 Deep Dive, What Changed and How to Use It

Role Based Agent Teams with CrewAI, A Production Walkthrough

Production Multi Agent Systems with LangGraph 0.2, A Hands On Tutorial

Multi Agent Systems in 2025, Architecture Patterns That Work

Let’s Start a Project