Building AI Agents That Actually Work: MCP Servers, Tool Orchestration, and Running Everything Locally

The AI world has a plumbing problem. We have incredible language models, but connecting them to real tools — databases, APIs, file systems, other agents — still feels like duct-taping HTTP endpoints together and praying. That's why the Model Context Protocol (MCP) matters. It's the missing standard for how AI models talk to the outside world, and it's changing how I build every agent project.

Over the past year, I've built 116+ open-source projects — voice assistants, healthcare summarizers, legal document analyzers, security tools — and the pattern I keep returning to is the same: a local LLM, an MCP server exposing tools, and an agent loop that ties it all together. No cloud dependency. No API keys expiring at 2 AM. Just a model, a protocol, and a purpose.

In this post, I'll walk through how MCP works, show you how to build your own MCP server in Python, and share patterns I've learned from shipping real agent projects.

What Is MCP and Why Should You Care?

The Model Context Protocol is an open standard (originally introduced by Anthropic) that defines how AI models discover and invoke external tools. Think of it as USB-C for AI: a single, standardized interface that any model can use to talk to any tool.

Before MCP, every AI integration was bespoke. Want your model to search a database? Write a custom function-calling wrapper. Want it to read PDFs? Another wrapper. Each tool spoke its own dialect, and switching models meant rewriting your glue code.

MCP fixes this with three core concepts:

Tools — Functions the model can invoke (e.g., search_documents, analyze_clause)
Resources — Data the model can read (files, database records, API responses)
Prompts — Reusable prompt templates that guide the model's behavior

The beauty is that MCP servers are model-agnostic. The same server works with Claude, GPT, Gemma, Llama, or any model that supports tool use. Build once, swap models freely.

Anatomy of an MCP Server in Python

Let's build a minimal MCP server. I use the mcp Python SDK, which makes this surprisingly clean:

from mcp.server.fastmcp import FastMCP

# Initialize the MCP server
mcp = FastMCP("document-tools")

@mcp.tool()
def summarize_document(text: str, max_length: int = 200) -> str:
    """Summarize a document to the specified length."""
    # In practice, this calls your local LLM
    from ollama import chat
    response = chat(
        model="gemma3:4b",
        messages=[{
            "role": "user",
            "content": f"Summarize in {max_length} words:\n\n{text}"
        }]
    )
    return response.message.content

@mcp.tool()
def extract_entities(text: str) -> dict:
    """Extract named entities from text."""
    from ollama import chat
    response = chat(
        model="gemma3:4b",
        messages=[{
            "role": "user",
            "content": f"Extract entities (people, orgs, dates) as JSON:\n\n{text}"
        }]
    )
    return {"entities": response.message.content}

@mcp.resource("docs://{doc_id}")
def get_document(doc_id: str) -> str:
    """Retrieve a document by ID."""
    docs = load_document_store()
    return docs.get(doc_id, "Document not found")

if __name__ == "__main__":
    mcp.run(transport="stdio")

That's a complete MCP server. The @mcp.tool() decorator registers functions that any MCP-compatible client can discover and call. The @mcp.resource() decorator exposes data through URI templates. Run it, and any MCP client can connect, list available tools, and start invoking them.

The Agent Loop: Where MCP Meets Real Work

An MCP server alone is just a toolbox. The magic happens when you wire it into an agent loop — the cycle where a model reasons about a task, picks a tool, executes it, and decides what to do next.

Here's the pattern I use across my projects:

import json
from ollama import chat
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def agent_loop(task: str, server_path: str):
    """Run an agent loop with MCP tool access."""
    server = StdioServerParameters(
        command="python", args=[server_path]
    )

    async with stdio_client(server) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Discover available tools
            tools = await session.list_tools()
            tool_descriptions = format_tools_for_prompt(tools)

            messages = [{
                "role": "system",
                "content": f"You have these tools:\n{tool_descriptions}\n"
                           f"Respond with JSON to call a tool: "
                           f'{{"tool": "name", "args": {{...}}}}'
            }, {
                "role": "user",
                "content": task
            }]

            # Agent loop: reason → act → observe → repeat
            for step in range(10):  # Max 10 steps
                response = chat(model="gemma3:4b", messages=messages)
                reply = response.message.content

                # Check if model wants to call a tool
                if '{"tool"' in reply:
                    call = json.loads(extract_json(reply))
                    result = await session.call_tool(
                        call["tool"], call.get("args", {})
                    )
                    messages.append({"role": "assistant", "content": reply})
                    messages.append({
                        "role": "user",
                        "content": f"Tool result: {result.content[0].text}"
                    })
                else:
                    return reply  # Final answer

            return "Max steps reached"

This is the skeleton behind several of my projects. The model discovers tools dynamically through MCP, reasons about which to call, executes them, and loops until it has an answer. No hardcoded tool lists. No brittle if-else chains.

Real Projects, Real Patterns

Let me show how this architecture maps to actual projects I've shipped:

CallPilot — Voice AI with MCP-Style Tool Routing

CallPilot is a voice AI assistant that routes spoken commands to specialized tools. The architecture mirrors MCP: a central orchestrator receives voice input, transcribes it, and dispatches to tool handlers — calendar lookups, email drafts, web searches — each registered as discrete, discoverable capabilities. The insight? Voice AI needs the same tool-routing patterns that text agents do.

Patient Intake Summarizer — Healthcare AI Agent

Patient Intake Summarizer processes patient intake forms and generates structured clinical summaries. The MCP pattern here exposes tools for PDF extraction, entity recognition (medications, conditions, allergies), and summary generation. Each tool runs locally — critical for healthcare where data cannot leave the premises.

Contract Clause Analyzer — Legal AI Agent

Contract Clause Analyzer breaks legal documents into clauses, classifies risk levels, and flags problematic language. The tool registration pattern shines here: extract_clauses, classify_risk, compare_to_template — each is a discrete MCP tool that the agent orchestrates based on what it finds.

DocShield — Document Security Agent

DocShield scans documents for sensitive information — SSNs, credit card numbers, API keys — and redacts them. The MCP resource pattern works perfectly: documents are exposed as resources, and scanning/redaction tools operate on them. The agent decides which scans to run based on document type.

PDF Chat Assistant — Conversational Document Q&A

PDF Chat Assistant lets you have a conversation with any PDF. It uses RAG (Retrieval-Augmented Generation) with a local vector store, exposing search_chunks and get_page as MCP tools. The agent retrieves relevant passages and synthesizes answers — all running on your machine.

Agent-to-Agent Communication: The A2A Protocol

MCP handles model-to-tool communication brilliantly, but what about agent-to-agent communication? That's where Google's A2A (Agent-to-Agent) protocol enters the picture.

A2A defines how autonomous agents discover each other, negotiate capabilities, and delegate tasks. Imagine a healthcare workflow where a patient intake agent hands off to a billing agent, which hands off to an insurance verification agent — each running independently, each exposing its capabilities through an "Agent Card."

# Example: A2A Agent Card (simplified)
agent_card = {
    "name": "patient-intake-agent",
    "description": "Processes patient intake forms",
    "url": "http://localhost:8001",
    "capabilities": {
        "streaming": True,
        "pushNotifications": False
    },
    "skills": [
        {
            "id": "intake-summary",
            "name": "Patient Intake Summary",
            "description": "Generates structured clinical summaries"
        }
    ]
}

MCP and A2A are complementary: MCP connects agents to tools, A2A connects agents to each other. Together, they form the backbone of truly interoperable AI systems.

Why Local-First Matters

Every project I mentioned runs entirely on local hardware with local models (I primarily use Gemma 3 through Ollama). This isn't a limitation — it's a feature:

Privacy: Healthcare and legal data stays on-premises. Period.
Cost: No per-token API charges. Run thousands of inferences for free.
Reliability: No network dependency. No rate limits. No surprise deprecations.
Speed: Local inference on a decent GPU is fast enough for most agent workflows.
Control: You own the entire stack. Swap models, modify prompts, add tools — no vendor lock-in.

The MCP architecture amplifies these benefits because your tools are decoupled from your model. When a better local model drops (and they drop weekly now), you swap it in without touching a single tool definition.

Getting Started

If you want to build your own MCP-powered agent:

Install the basics: pip install mcp ollama and pull a model with ollama pull gemma3:4b
Start with one tool: Build an MCP server with a single useful tool. Get the loop working.
Add tools incrementally: Each new tool is just another decorated function.
Think in resources: What data does your agent need? Expose it through MCP resources.
Keep it local: You don't need cloud APIs for most agent work. A 4B parameter model handles tool routing surprisingly well.

The MCP ecosystem is growing fast. IDE integrations, framework support, community servers — it's all converging on this standard. The agents you build today with MCP will plug into tomorrow's ecosystem without rewrites.

Conclusion

Building AI agents isn't about chasing the biggest model or the fanciest framework. It's about having a clean protocol for connecting models to tools, a reliable agent loop, and problems worth solving. MCP gives us that protocol. Local models give us independence. And the combination lets anyone — not just companies with massive API budgets — build agents that do real work.

I've published all the projects mentioned here (and 110+ more) as open source. Clone them, break them, build on them. That's the point.

About the Author

Nrk Raju Guthikonda is a Senior Software Engineer at Microsoft on the Copilot Search Infrastructure team, working on Semantic Indexing and RAG systems. He maintains 116+ open-source repositories spanning AI/ML, healthcare, legal tech, developer tools, and creative AI — all built to run locally with models like Gemma and Ollama.

GitHub: github.com/kennedyraju55
dev.to: dev.to/kennedyraju55
LinkedIn: linkedin.com/in/nrk-raju-guthikonda-504066a8