AgentStack
MCP unreviewed MIT Self-run

Ollama Handoff

mcp-michael-whitecapdata-ollama-handoff · by Michael-WhiteCapData

Offload cheap work from your AI agent to a local Ollama model, at zero cloud cost.

No reviews yet
0 installs
1 views
0.0% view→install

Install

$ agentstack add mcp-michael-whitecapdata-ollama-handoff

Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.

Are you the author of Ollama Handoff? Claim this listing to set pricing, connect Stripe payouts, and keep 70% of every sale.

About

ollama-handoff

An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.

[](https://github.com/Michael-WhiteCapData/ollama-handoff/actions/workflows/ci.yml) [](https://pypi.org/project/ollama-handoff/) [](https://www.python.org/) [](https://modelcontextprotocol.io/) [](LICENSE)

Your frontier model (Claude, GPT, etc.) is brilliant and metered. A lot of the work it gets handed — summarizing a log, drafting a commit message, pulling every URL out of a file, a quick first-pass code review — doesn't need frontier reasoning at all. ollama-handoff exposes your local Ollama instance as a handful of purpose-built MCP tools, so your agent can route that work to a model on your own GPU — at zero cloud cost — and spend its (paid) reasoning budget on the things that actually need it.

This isn't a generic "wrap the Ollama API" server. Each tool ships with a baked-in system prompt and a description written for the calling agent, so the agent knows when to hand off and gets a tuned result back without re-stating instructions every call.


Why you'd want this

  • 💸 Spend less. Routine offloads run locally and bill nothing.
  • Keep the big model focused. Summaries, extractions, and drafts don't eat its context or your budget.
  • 🧠 Tuned, not raw. summarize_local, code_review_local, draft_commit_message_local, and extract_local come with reviewer/summarizer/extractor system prompts already dialed in.
  • 🔌 Drop-in. One MCP registration; works with Claude Code, Claude Desktop, Cursor, and any MCP client.
  • 🪶 Tiny & auditable. Two dependencies (mcp, httpx), fully typed, unit-tested, no telemetry.

Requirements

  • Ollama running locally (ollama serve) with at least one model pulled, e.g. ollama pull qwen2.5-coder:14b.
  • Python 3.11+ (or just uvx, which manages it for you).

Install

The fastest path is uv — no manual venv needed:

uvx ollama-handoff          # run directly
# or
pip install ollama-handoff  # then run: ollama-handoff

Claude Code

claude mcp add ollama-handoff -- uvx ollama-handoff

Claude Desktop / Cursor (mcp config block)

{
  "mcpServers": {
    "ollama-handoff": {
      "command": "uvx",
      "args": ["ollama-handoff"],
      "env": {
        "OLLAMA_DEFAULT_MODEL": "qwen2.5-coder:14b"
      }
    }
  }
}

Run with Docker

A [Dockerfile](Dockerfile) is included. The server speaks MCP over stdio, so run it interactively (-i) and point it at your Ollama instance:

docker build -t ollama-handoff .
docker run --rm -i -e OLLAMA_URL=http://host.docker.internal:11434 ollama-handoff

On native Linux (no Docker Desktop), use --network=host with OLLAMA_URL=http://localhost:11434.

Tools

| Tool | What it does | When the agent should reach for it | | --- | --- | --- | | ask_local | One-shot prompt to the local model | Any handoff that doesn't need frontier reasoning | | chat_local | Multi-turn local chat | Handoffs needing more than one turn of context | | summarize_local | Structured summary (headline + bullets) | Long files, logs, transcripts, docs | | code_review_local | Quick first-pass review of a diff/code | Cheap pre-filter before a deep review | | draft_commit_message_local | Conventional commit message from a diff | Routine commits | | extract_local | Pull structured items from unstructured text | URLs, function names, error codes, TODOs | | list_models | List locally available Ollama models | Discovery / choosing a model | | server_info | Report the effective configuration | Debugging setup |

Configuration

All configuration is via environment variables set in your MCP registration:

| Variable | Default | Description | | --- | --- | --- | | OLLAMA_URL | http://localhost:11434 | Base URL of the Ollama server | | OLLAMA_DEFAULT_MODEL | qwen2.5-coder:14b | Default model for handoffs | | OLLAMA_NUM_CTX | 32768 | Context window in tokens | | OLLAMA_KEEP_ALIVE | 30m | How long to keep the model resident in VRAM | | OLLAMA_TIMEOUT_S | 600 | Per-request timeout, seconds |

Example

Once registered, you don't call the tools yourself — your agent does. A typical exchange:

> You: Summarize the errors in build.log and draft a commit for the staged fix. > > Agent: (calls summarize_local(build.log, focus="errors and stack traces") and draft_commit_message_local(git diff --staged) — both run on your GPU, nothing billed) → returns the summary + commit message.

Development

git clone https://github.com/Michael-WhiteCapData/ollama-handoff
cd ollama-handoff
uv pip install -e ".[dev]"
ruff check .
pytest          # tests use httpx.MockTransport — no running Ollama required

See [CONTRIBUTING.md](CONTRIBUTING.md). Contributions welcome — especially new specialized handoff tools.

License

[MIT](LICENSE) © Michael Tierney

Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

Install and usage instructions live in the source repository linked above.

Reviews

No reviews yet — be the first.

Versions

  • v0.1.1 Imported from the upstream source.