Install
$ agentstack add mcp-theaisingularity-agentic-research-engine-oss Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.
About
agentic-research-engine-oss
The best $0 research agent that runs on a laptop. Open-source end-to-end, reproducible, privacy-preserving. No cloud dependency by default; no telemetry; every LLM call, every source, and every verification decision is visible.
Table of contents
- [TL;DR](#tldr)
- [Why use this instead of…](#why-use-this-instead-of)
- [Quickstart — Mac local](#quickstart--mac-local)
- [Quickstart — no install (Google Colab)](#quickstart--no-install-google-colab)
- [Three ways to drive it](#three-ways-to-drive-it)
- [What ships](#what-ships)
- [Domain presets](#domain-presets)
- [Bring your own documents](#bring-your-own-documents)
- [MCP + Claude plugin](#mcp--claude-plugin)
- [Plugin / skill loader](#plugin--skill-loader)
- [Architecture at a glance](#architecture-at-a-glance)
- [Repo layout](#repo-layout)
- [Configuration (env vars)](#configuration-env-vars)
- [Testing](#testing)
- [Troubleshooting](#troubleshooting)
- [Honest limits](#honest-limits)
- [Status + roadmap](#status--roadmap)
- [Contributing](#contributing)
- [License](#license)
TL;DR
Local-first research agent that verifies its own answers. Runs on Gemma 3 4B + Ollama (3.3 GB on disk) for $0/query; swaps to any OpenAI-compatible endpoint with one env var.
pip install agentic-research-engine
agentic-research ask "what is Anthropic's contextual retrieval?" --domain papers
| | | |---|---| | Interfaces | CLI · Textual TUI · FastAPI web GUI · MCP server (Claude Desktop / Cursor / Continue) | | Pipeline | 8-node LangGraph (classify → plan → search → retrieve → fetch → compress → synthesize → verify); every node env-toggleable for ablation | | Retrieval | SearXNG meta-search + trafilatura fetch + hybrid BM25 / dense / RRF; opt-in bge-reranker-v2-m3 cross-encoder | | Reasoning | HyDE query expansion · FLARE active retrieval · Chain-of-Verification (Dhuliawala et al 2023) · ThinkPRM step critic | | Domains | 6 presets (general · medical · papers · financial · stock_trading · personal_docs) — write your own in 10 lines of YAML | | Plugins | load Claude plugins or agentskills.io skills from GitHub or local paths | | Memory | opt-in local SQLite trajectory log with semantic retrieval; wipe anytime; no telemetry | | Providers | OpenAI · Groq · vLLM · SGLang · Together · Ollama — any OpenAI-compatible endpoint via OPENAI_BASE_URL | | Quality | 137 mocked tests, zero-network · honest live benchmarks published in [RESULTS.md](engine/benchmarks/RESULTS.md) · MIT end-to-end |
Why use this instead of…
| you currently use | we give you | |---|---| | Perplexity / ChatGPT Deep Research / Kagi Assistant | the same reasoning-with-citations flow, local and free, with your data never leaving the machine | | Perplexica self-hosted | the UX Perplexica has plus a CoVe verifier, FLARE active retrieval, adaptive compute router, and Claude-plugin packaging | | Khoj | stronger research-specific reasoning (we're not personal-knowledge-focused), six domain presets, and an MCP server for other agents to call | | gpt-researcher | newer pipeline architecture, better small-model handling, observable trace, plugin ecosystem | | MiroThinker-H1 / OpenResearcher-30B | they're stronger on BrowseComp; we run on a laptop with no GPU and cost $0 | | Writing your own LangGraph research agent | save 2-3 months; reuse our 8-node pipeline + 30+ tested env gates + 137 tests |
Honest read: on complex multi-hop reasoning benchmarks, Gemma 3 4B sits 15–25% below 30 B+ open models. We don't claim to beat GPT-5.4 Pro. We claim to be the best $0, runs-on-your-laptop, fully-open research agent in April 2026.
Quickstart — Mac local
Option A — PyPI (fastest)
# 1) Local inference (Ollama + Gemma 3 4B + embedding model — 3.6 GB combined)
brew install ollama
ollama pull gemma3:4b nomic-embed-text
# 2) Self-hosted meta-search (Docker; optional but recommended)
docker run -d --name searxng -p 8888:8080 searxng/searxng
# 3) The engine itself
pip install agentic-research-engine
# 4) Go
export OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_API_KEY=ollama
export MODEL_SYNTHESIZER=gemma3:4b EMBED_MODEL=nomic-embed-text
export SEARXNG_URL=http://localhost:8888
agentic-research ask "what is Anthropic's contextual retrieval?" --domain papers
Option B — from source
# 1) Same local-inference prereqs as Option A (ollama pull + docker run)
# 2) Clone + install (gives you the CLI, TUI, Web GUI, MCP server, benchmarks, tutorials)
git clone https://github.com/TheAiSingularity/agentic-research-engine-oss
cd agentic-research-engine-oss
(cd scripts/searxng && docker compose up -d)
cd engine && make install
make smoke # end-to-end run on the canonical "what is contextual retrieval" question
Expected wall-clock on an M-series Mac: ~45 s for a factoid, ~90 s for multi-hop synthesis. Zero dollars per query.
Higher honesty — cloud-model mode
Gemma 3 4B is surprisingly good at structure (plan, route, verify, compress) but confabulates specific factoids when SearXNG doesn't surface a source containing the right token. Live SimpleQA-mini run on 2026-04-21 (see [engine/benchmarks/RESULTS.md](engine/benchmarks/RESULTS.md)) showed gemma3:4b emitting "2023" for "year Anthropic published Contextual Retrieval" (gold: 2024) and "LayoutLMv3" for "which cross-encoder for reranking" (gold: bge-reranker-v2-m3).
The fix you probably want isn't a smarter synthesizer — it's a more honest one. A 5-question head-to-head on the same retrieval output showed gpt-5-nano + gpt-5-mini refuse to confabulate when evidence was missing ("The provided evidence does not answer this question"), where gemma3:4b confidently guessed. Per-claim faithfulness went from 82.9 % → 100 %. Pass rate barely moved (1/5 vs 0/5) because retrieval is the real bottleneck — if SearXNG didn't return a source with the gold token, neither model can produce it.
Swap the whole stack to a cloud endpoint:
# drop the Ollama base URL (fall back to OpenAI cloud)
unset OPENAI_BASE_URL
export OPENAI_API_KEY=sk-...
# defaults are already cloud-sized: gpt-5-nano for plan/verify, gpt-5-mini for synth.
# Explicit override if you want to pin them:
export MODEL_PLANNER=gpt-5-nano
export MODEL_SYNTHESIZER=gpt-5-mini # or gpt-5, claude-sonnet-4-5, etc.
agentic-research ask "…" --domain papers
Cost is dominated by synthesizer tokens (~5–15 k per query). Full cloud mode with gpt-5-nano + gpt-5-mini runs roughly $0.02–0.05 per research query and is ~2-3× slower than Gemma local (measured: 127 s vs 52 s mean wall on the 5-question subset). Works with any OpenAI-compatible endpoint — Groq, Together, Mistral, DeepSeek, local vLLM — so you can pick a cheap fast model (llama-3.3-70b on Groq ≈ $0.003/query) or a frontier one. Per-node base-URL routing (run gemma3:4b locally for plan/verify AND gpt-5-mini on cloud for synth in the same query) is tracked for 0.2; today the pipeline uses one global OPENAI_BASE_URL.
The bigger accuracy lever is retrieval. Point LOCAL_CORPUS_PATH at an indexed corpus containing your answer and either model will be correct.
Quickstart — no install (Google Colab)
Five runnable notebooks in [tutorials/](tutorials/):
- [01 — Engine API quickstart (mocked, no key)](tutorials/01engineapi_quickstart.ipynb) — see how the pipeline works without running inference.
- [02 — Groq cloud inference (free tier)](tutorials/02groqcloud_inference.ipynb) — real LLM, no local GPU.
- [03 — Build your own corpus](tutorials/03buildyourowncorpus.ipynb) — upload PDFs, index them, query.
- [04 — MCP server from Python](tutorials/04mcpserverfrompython.ipynb) — drive the engine as a tool from another agent.
- [05 — Domain presets showcase](tutorials/05domainpresets_showcase.ipynb) — compare presets on the same question.
Each notebook is self-contained, runs end-to-end on Colab free tier, no credit card required.
Three ways to drive it
CLI
engine ask "what is hybrid retrieval?" --domain papers --memory session
engine reset-memory
engine domains list
engine version
TUI (Textual — keyboard-driven, SSH-safe)
make tui
Three panes: sources · answer + hallucination flags · trace + memory hits. Press Enter to ask, Ctrl-M to cycle memory mode, Ctrl-L to clear, Ctrl-Q to quit.
Web GUI (FastAPI + HTMX on localhost:8080)
make gui
# open http://127.0.0.1:8080 in your browser
No auth. No cloud. No analytics. Dark theme. Streams tokens in place.
What ships
engine/ — the flagship
8-node LangGraph pipeline with 2026-SOTA composition: classify → plan → search → retrieve → fetch_url → compress → synthesize → verify
Every stage is env-toggleable for leave-one-out ablation. Techniques folded in: HyDE, CoVe verification, iterative retrieval, FLARE active retrieval, question classifier router, step critic (ThinkPRM pattern), LongLLMLingua-lite compression, cross-encoder rerank (BAAI/bge-reranker-v2-m3), Anthropic contextual chunking, W6 small- model hardening (three-case synthesize prompt + per-chunk char cap).
core/rag/ — reusable retrieval primitives (v1 stable)
HybridRetriever (BM25 + dense + RRF) · CrossEncoderReranker · contextualize_chunks (Anthropic pattern) · CorpusIndex (bring- your-own-PDFs). 5 exports, used by the engine and the archived recipes.
archive/recipes/ — pre-engine reference recipes
research-assistant, trading-copilot, document-qa, rust-mcp-search-tool. All still work; all tests still pass. The research-assistant/production/main.py is a thin shim over engine.core.pipeline so the cookbook framing is preserved.
Domain presets
Six YAML files in engine/domains/:
| preset | when to use | |---|---| | general | default; anything | | medical | disease / treatment / drug / trial (PubMed / Cochrane / NEJM bias; no prescriptive advice) | | papers | academic CS / ML / physics / biology (arXiv + Semantic Scholar + OpenReview) | | financial | SEC filings, earnings, company fundamentals (dates on every number) | | stock_trading | technical + news per ticker — hard rule: never recommends buy/sell/hold | | personal_docs | Q&A over your own corpus, air-gapped (only corpus:// URLs allowed) |
Write your own in ~10 lines of YAML — see [docs/domains.md](docs/domains.md).
Bring your own documents
python scripts/index_corpus.py build ~/papers --out ~/papers.idx
export LOCAL_CORPUS_PATH=~/papers.idx
engine ask "what do my papers say about contextual retrieval?" --domain personal_docs
Supported formats: PDF (via pypdf), Markdown, plain text, HTML (via trafilatura). The index persists as a directory with a human-readable manifest.json + a pickled index.pkl. Rebuild anytime the docs change.
Details: [docs/self-learning.md](docs/self-learning.md) covers the trajectory + memory model; [docs/plugins-skills.md](docs/plugins-skills.md) covers external plugins.
MCP + Claude plugin
engine/mcp/server.py is a Python MCP server exposing:
research(question, domain?, memory?)→ structured{answer, verified_claims, unverified_claims, sources, trace, totals, memory_hits}reset_memory()memory_count()
Bundled Claude plugin at engine/mcp/claude_plugin/ — four skills (/research, /cite-sources, /verify-claim, /set-domain), ready to submit to the Anthropic marketplace.
Register in Claude Desktop:
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"engine": {
"command": "python",
"args": ["-m", "engine.mcp.server"],
"env": {
"OPENAI_BASE_URL": "http://localhost:11434/v1",
"OPENAI_API_KEY": "ollama",
"MODEL_SYNTHESIZER": "gemma3:4b",
"SEARXNG_URL": "http://localhost:8888"
}
}
}
}
Plugin / skill loader
Install third-party Claude plugins or Hermes (agentskills.io) skills:
engine plugins install gh:owner/some-research-plugin@v1
engine plugins install file:./my-local-plugin
engine plugins install https://example.com/marketplace.json
engine plugins list
engine plugins uninstall some-plugin
Safety: every install runs a forbidden-symbols scan (eval(, exec(, os.system(, …) — rejects plugins that would execute arbitrary code. Registry lives at ~/.agentic-research/plugins/, fully inspectable, wipable.
Full docs: [docs/plugins-skills.md](docs/plugins-skills.md).
Architecture at a glance
┌─────────────┐
│ question │
└──────┬──────┘
▼
┌─────────────────────────┐ T4.3 router — route by question type
│ classify │
└──────────┬──────────────┘
▼
┌─────────────────────────┐ T1 decompose · T2 HyDE · T4.1 critic
│ plan │ T4.5 refine-on-reject
└──────────┬──────────────┘
▼
┌─────────────────────────┐ SearXNG parallel × N
│ search │ + W5 local corpus (optional)
│ (+ T4.1 critic) │ + T4.1 coverage critic
└──────────┬──────────────┘
▼
┌─────────────────────────┐ T1 hybrid BM25 + dense + RRF
│ retrieve │ W4.1 cross-encoder rerank (opt-in)
│ (+ W4.1 rerank) │
└──────────┬──────────────┘
▼
┌─────────────────────────┐ W4.2 trafilatura clean-text
│ fetch_url │ skips corpus:// URLs
└──────────┬──────────────┘
▼
┌─────────────────────────┐ T4.4 LLM distillation
│ compress │ + W6.2 per-chunk char cap
│ (+ W6.2 cap) │
└──────────┬──────────────┘
▼
┌─────────────────────────┐ T2 synth · T4.2 FLARE on hedges
│ synthesize │ W6.1 three-case anti-hallucinate
│ (+ FLARE + stream) │ W7 streaming
└──────────┬──────────────┘
▼
┌─────────────────────────┐ T2 CoVe — decompose + verify
│ verify │
└────────┬────────────────┘
│
verified? ── yes ──▶ END
│
no
│
◀────── re-search unverified claims ──── loop (bounded by MAX_ITERATIONS)
Every stage has an ENABLE_* flag so you can leave-one-out ablate. Deep spec: [docs/architecture.md](docs/architecture.md).
Repo layout
agentic-research-engine-oss/
├── engine/ the flagship research engine
│ ├── core/ pipeline · models · trace · memory
│ │ ├── pipeline.py · compaction · domains · plugins
│ │ ├── models.py
│ │ ├── trace.py
│ │ ├── memory.py
│ │ ├── compaction.py
│ │ ├── domains.py
│ │ └── plugins.py
│ ├── interfaces/
│ │ ├── cli.py rich stdout CLI with subcommands
│ │ ├── tui.py Textual TUI
│ │ └── web/ FastAPI + HTMX localhost GUI
│ ├── mcp/
│ │ ├── server.py Python FastMCP server
│ │ └── claude_plugin/ submittable Claude plugin bundle
│ ├── domains/ 6 YAML presets
│ ├── examples/ 5 worked research examples
│ ├── benchmarks/ mini SimpleQA + BrowseComp fixtures + runner
│ └── tests/ pytest suite (all mocked, zero-network)
├── core/rag/ shared retrieval primitives (stable v
…
## Source & license
This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.
- **Author:** [TheAiSingularity](https://github.com/TheAiSingularity)
- **Source:** [TheAiSingularity/agentic-research-engine-oss](https://github.com/TheAiSingularity/agentic-research-engine-oss)
- **License:** MIT
- **Homepage:** https://github.com/TheAiSingularity/agentic-research-engine-oss
Install and usage instructions live in the source repository linked above.
Reviews
No reviews yet — be the first.
Write a review
Versions
- v0.1.2 Imported from the upstream source.