Install
$ agentstack add mcp-michael-whitecapdata-ollama-handoff Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.
About
ollama-handoff
An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.
[](https://github.com/Michael-WhiteCapData/ollama-handoff/actions/workflows/ci.yml) [](https://pypi.org/project/ollama-handoff/) [](https://www.python.org/) [](https://modelcontextprotocol.io/) [](LICENSE)
Your frontier model (Claude, GPT, etc.) is brilliant and metered. A lot of the work it gets handed — summarizing a log, drafting a commit message, pulling every URL out of a file, a quick first-pass code review — doesn't need frontier reasoning at all. ollama-handoff exposes your local Ollama instance as a handful of purpose-built MCP tools, so your agent can route that work to a model on your own GPU — at zero cloud cost — and spend its (paid) reasoning budget on the things that actually need it.
This isn't a generic "wrap the Ollama API" server. Each tool ships with a baked-in system prompt and a description written for the calling agent, so the agent knows when to hand off and gets a tuned result back without re-stating instructions every call.
Why you'd want this
- 💸 Spend less. Routine offloads run locally and bill nothing.
- ⚡ Keep the big model focused. Summaries, extractions, and drafts don't eat its context or your budget.
- 🧠 Tuned, not raw.
summarize_local,code_review_local,draft_commit_message_local, andextract_localcome with reviewer/summarizer/extractor system prompts already dialed in. - 🔌 Drop-in. One MCP registration; works with Claude Code, Claude Desktop, Cursor, and any MCP client.
- 🪶 Tiny & auditable. Two dependencies (
mcp,httpx), fully typed, unit-tested, no telemetry.
Requirements
- Ollama running locally (
ollama serve) with at least one model pulled, e.g.ollama pull qwen2.5-coder:14b. - Python 3.11+ (or just
uvx, which manages it for you).
Install
The fastest path is uv — no manual venv needed:
uvx ollama-handoff # run directly
# or
pip install ollama-handoff # then run: ollama-handoff
Claude Code
claude mcp add ollama-handoff -- uvx ollama-handoff
Claude Desktop / Cursor (mcp config block)
{
"mcpServers": {
"ollama-handoff": {
"command": "uvx",
"args": ["ollama-handoff"],
"env": {
"OLLAMA_DEFAULT_MODEL": "qwen2.5-coder:14b"
}
}
}
}
Run with Docker
A [Dockerfile](Dockerfile) is included. The server speaks MCP over stdio, so run it interactively (-i) and point it at your Ollama instance:
docker build -t ollama-handoff .
docker run --rm -i -e OLLAMA_URL=http://host.docker.internal:11434 ollama-handoff
On native Linux (no Docker Desktop), use --network=host with OLLAMA_URL=http://localhost:11434.
Tools
| Tool | What it does | When the agent should reach for it | | --- | --- | --- | | ask_local | One-shot prompt to the local model | Any handoff that doesn't need frontier reasoning | | chat_local | Multi-turn local chat | Handoffs needing more than one turn of context | | summarize_local | Structured summary (headline + bullets) | Long files, logs, transcripts, docs | | code_review_local | Quick first-pass review of a diff/code | Cheap pre-filter before a deep review | | draft_commit_message_local | Conventional commit message from a diff | Routine commits | | extract_local | Pull structured items from unstructured text | URLs, function names, error codes, TODOs | | list_models | List locally available Ollama models | Discovery / choosing a model | | server_info | Report the effective configuration | Debugging setup |
Configuration
All configuration is via environment variables set in your MCP registration:
| Variable | Default | Description | | --- | --- | --- | | OLLAMA_URL | http://localhost:11434 | Base URL of the Ollama server | | OLLAMA_DEFAULT_MODEL | qwen2.5-coder:14b | Default model for handoffs | | OLLAMA_NUM_CTX | 32768 | Context window in tokens | | OLLAMA_KEEP_ALIVE | 30m | How long to keep the model resident in VRAM | | OLLAMA_TIMEOUT_S | 600 | Per-request timeout, seconds |
Example
Once registered, you don't call the tools yourself — your agent does. A typical exchange:
> You: Summarize the errors in build.log and draft a commit for the staged fix. > > Agent: (calls summarize_local(build.log, focus="errors and stack traces") and draft_commit_message_local(git diff --staged) — both run on your GPU, nothing billed) → returns the summary + commit message.
Development
git clone https://github.com/Michael-WhiteCapData/ollama-handoff
cd ollama-handoff
uv pip install -e ".[dev]"
ruff check .
pytest # tests use httpx.MockTransport — no running Ollama required
See [CONTRIBUTING.md](CONTRIBUTING.md). Contributions welcome — especially new specialized handoff tools.
License
[MIT](LICENSE) © Michael Tierney
Source & license
This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.
- Author: Michael-WhiteCapData
- Source: Michael-WhiteCapData/ollama-handoff
- License: MIT
- Homepage: https://github.com/Michael-WhiteCapData/ollama-handoff
Install and usage instructions live in the source repository linked above.
Reviews
No reviews yet — be the first.
Write a review
Versions
- v0.1.1 Imported from the upstream source.