# Agentic Research Engine Oss

> Local research agent that verifies its own answers. Runs on Gemma 3 4B + Ollama, $0/query.

- **Type:** MCP server
- **Install:** `agentstack add mcp-theaisingularity-agentic-research-engine-oss`
- **Verified:** Pending review
- **Seller:** [TheAiSingularity](https://agentstack.voostack.com/s/theaisingularity)
- **Installs:** 0
- **Latest version:** 0.1.2
- **License:** MIT
- **Upstream author:** [TheAiSingularity](https://github.com/TheAiSingularity)
- **Source:** https://github.com/TheAiSingularity/agentic-research-engine-oss
- **Website:** https://github.com/TheAiSingularity/agentic-research-engine-oss

## Install

```sh
agentstack add mcp-theaisingularity-agentic-research-engine-oss
```

Requires the [AgentStack CLI](https://agentstack.voostack.com/docs/cli). Works with Claude Code, Cursor, and any MCP-compatible agent.

## About

agentic-research-engine-oss

  
  
  
  
  
  
  

**The best $0 research agent that runs on a laptop.** Open-source
end-to-end, reproducible, privacy-preserving. No cloud dependency by
default; no telemetry; every LLM call, every source, and every
verification decision is visible.

---

## Table of contents

- [TL;DR](#tldr)
- [Why use this instead of…](#why-use-this-instead-of)
- [Quickstart — Mac local](#quickstart--mac-local)
- [Quickstart — no install (Google Colab)](#quickstart--no-install-google-colab)
- [Three ways to drive it](#three-ways-to-drive-it)
- [What ships](#what-ships)
- [Domain presets](#domain-presets)
- [Bring your own documents](#bring-your-own-documents)
- [MCP + Claude plugin](#mcp--claude-plugin)
- [Plugin / skill loader](#plugin--skill-loader)
- [Architecture at a glance](#architecture-at-a-glance)
- [Repo layout](#repo-layout)
- [Configuration (env vars)](#configuration-env-vars)
- [Testing](#testing)
- [Troubleshooting](#troubleshooting)
- [Honest limits](#honest-limits)
- [Status + roadmap](#status--roadmap)
- [Contributing](#contributing)
- [License](#license)

---

## TL;DR

**Local-first research agent that verifies its own answers.** Runs on
Gemma 3 4B + Ollama (3.3 GB on disk) for `$0/query`; swaps to any
OpenAI-compatible endpoint with one env var.

```bash
pip install agentic-research-engine
agentic-research ask "what is Anthropic's contextual retrieval?" --domain papers
```

| | |
|---|---|
| **Interfaces** | CLI · Textual TUI · FastAPI web GUI · MCP server (Claude Desktop / Cursor / Continue) |
| **Pipeline** | 8-node LangGraph (`classify → plan → search → retrieve → fetch → compress → synthesize → verify`); every node env-toggleable for ablation |
| **Retrieval** | SearXNG meta-search + trafilatura fetch + hybrid BM25 / dense / RRF; opt-in `bge-reranker-v2-m3` cross-encoder |
| **Reasoning** | HyDE query expansion · FLARE active retrieval · Chain-of-Verification (Dhuliawala et al 2023) · ThinkPRM step critic |
| **Domains** | 6 presets (`general` · `medical` · `papers` · `financial` · `stock_trading` · `personal_docs`) — write your own in 10 lines of YAML |
| **Plugins** | load Claude plugins or `agentskills.io` skills from GitHub or local paths |
| **Memory** | opt-in local SQLite trajectory log with semantic retrieval; wipe anytime; no telemetry |
| **Providers** | OpenAI · Groq · vLLM · SGLang · Together · Ollama — any OpenAI-compatible endpoint via `OPENAI_BASE_URL` |
| **Quality** | 137 mocked tests, zero-network · honest live benchmarks published in [`RESULTS.md`](engine/benchmarks/RESULTS.md) · MIT end-to-end |

---

## Why use this instead of…

| you currently use | we give you |
|---|---|
| **Perplexity / ChatGPT Deep Research / Kagi Assistant** | the same reasoning-with-citations flow, **local and free**, with your data never leaving the machine |
| **Perplexica self-hosted** | the UX Perplexica has plus a CoVe verifier, FLARE active retrieval, adaptive compute router, and Claude-plugin packaging |
| **Khoj** | stronger research-specific reasoning (we're not personal-knowledge-focused), six domain presets, and an MCP server for other agents to call |
| **gpt-researcher** | newer pipeline architecture, better small-model handling, observable trace, plugin ecosystem |
| **MiroThinker-H1 / OpenResearcher-30B** | they're stronger on BrowseComp; we run on a laptop with no GPU and cost $0 |
| **Writing your own LangGraph research agent** | save 2-3 months; reuse our 8-node pipeline + 30+ tested env gates + 137 tests |

**Honest read:** on complex multi-hop reasoning benchmarks, Gemma 3 4B
sits 15–25% below 30 B+ open models. We don't claim to beat GPT-5.4
Pro. We claim to be the best **$0, runs-on-your-laptop, fully-open**
research agent in April 2026.

---

## Quickstart — Mac local

### Option A — PyPI (fastest)

```bash
# 1) Local inference (Ollama + Gemma 3 4B + embedding model — 3.6 GB combined)
brew install ollama
ollama pull gemma3:4b nomic-embed-text

# 2) Self-hosted meta-search (Docker; optional but recommended)
docker run -d --name searxng -p 8888:8080 searxng/searxng

# 3) The engine itself
pip install agentic-research-engine

# 4) Go
export OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_API_KEY=ollama
export MODEL_SYNTHESIZER=gemma3:4b EMBED_MODEL=nomic-embed-text
export SEARXNG_URL=http://localhost:8888
agentic-research ask "what is Anthropic's contextual retrieval?" --domain papers
```

### Option B — from source

```bash
# 1) Same local-inference prereqs as Option A (ollama pull + docker run)

# 2) Clone + install (gives you the CLI, TUI, Web GUI, MCP server, benchmarks, tutorials)
git clone https://github.com/TheAiSingularity/agentic-research-engine-oss
cd agentic-research-engine-oss
(cd scripts/searxng && docker compose up -d)
cd engine && make install
make smoke    # end-to-end run on the canonical "what is contextual retrieval" question
```

Expected wall-clock on an M-series Mac: **~45 s** for a factoid,
~90 s for multi-hop synthesis. Zero dollars per query.

### Higher honesty — cloud-model mode

Gemma 3 4B is surprisingly good at **structure** (plan, route, verify,
compress) but confabulates **specific factoids** when SearXNG doesn't
surface a source containing the right token. Live SimpleQA-mini run on
2026-04-21 (see [`engine/benchmarks/RESULTS.md`](engine/benchmarks/RESULTS.md))
showed `gemma3:4b` emitting "2023" for *"year Anthropic published
Contextual Retrieval"* (gold: 2024) and "LayoutLMv3" for *"which
cross-encoder for reranking"* (gold: bge-reranker-v2-m3).

The fix you probably want isn't a smarter synthesizer — it's a
**more honest** one. A 5-question head-to-head on the same retrieval
output showed `gpt-5-nano` + `gpt-5-mini` refuse to confabulate when
evidence was missing (*"The provided evidence does not answer this
question"*), where `gemma3:4b` confidently guessed. Per-claim
faithfulness went from 82.9 % → 100 %. Pass rate barely moved (1/5
vs 0/5) because **retrieval is the real bottleneck** — if SearXNG
didn't return a source with the gold token, neither model can
produce it.

Swap the whole stack to a cloud endpoint:

```bash
# drop the Ollama base URL (fall back to OpenAI cloud)
unset OPENAI_BASE_URL
export OPENAI_API_KEY=sk-...
# defaults are already cloud-sized: gpt-5-nano for plan/verify, gpt-5-mini for synth.
# Explicit override if you want to pin them:
export MODEL_PLANNER=gpt-5-nano
export MODEL_SYNTHESIZER=gpt-5-mini        # or gpt-5, claude-sonnet-4-5, etc.
agentic-research ask "…" --domain papers
```

Cost is dominated by synthesizer tokens (~5–15 k per query). Full
cloud mode with `gpt-5-nano` + `gpt-5-mini` runs roughly
**$0.02–0.05 per research query** and is ~2-3× slower than Gemma
local (measured: 127 s vs 52 s mean wall on the 5-question subset).
Works with any OpenAI-compatible endpoint — Groq, Together, Mistral,
DeepSeek, local vLLM — so you can pick a cheap fast model
(`llama-3.3-70b` on Groq ≈ $0.003/query) or a frontier one. Per-node
base-URL routing (run gemma3:4b locally for plan/verify AND gpt-5-mini
on cloud for synth in the same query) is tracked for 0.2; today the
pipeline uses one global `OPENAI_BASE_URL`.

**The bigger accuracy lever is retrieval.** Point
`LOCAL_CORPUS_PATH` at an indexed corpus containing your answer and
either model will be correct.

---

## Quickstart — no install (Google Colab)

Five runnable notebooks in [`tutorials/`](tutorials/):

1. [**01 — Engine API quickstart** (mocked, no key)](tutorials/01_engine_api_quickstart.ipynb) — see how the pipeline works without running inference.
2. [**02 — Groq cloud inference** (free tier)](tutorials/02_groq_cloud_inference.ipynb) — real LLM, no local GPU.
3. [**03 — Build your own corpus**](tutorials/03_build_your_own_corpus.ipynb) — upload PDFs, index them, query.
4. [**04 — MCP server from Python**](tutorials/04_mcp_server_from_python.ipynb) — drive the engine as a tool from another agent.
5. [**05 — Domain presets showcase**](tutorials/05_domain_presets_showcase.ipynb) — compare presets on the same question.

Each notebook is self-contained, runs end-to-end on Colab free tier, no
credit card required.

---

## Three ways to drive it

### CLI

```bash
engine ask "what is hybrid retrieval?" --domain papers --memory session
engine reset-memory
engine domains list
engine version
```

### TUI (Textual — keyboard-driven, SSH-safe)

```bash
make tui
```

Three panes: sources · answer + hallucination flags · trace + memory hits.
Press Enter to ask, Ctrl-M to cycle memory mode,
Ctrl-L to clear, Ctrl-Q to quit.

### Web GUI (FastAPI + HTMX on `localhost:8080`)

```bash
make gui
# open http://127.0.0.1:8080 in your browser
```

No auth. No cloud. No analytics. Dark theme. Streams tokens in place.

---

## What ships

### `engine/` — the flagship

8-node LangGraph pipeline with 2026-SOTA composition:
`classify → plan → search → retrieve → fetch_url → compress → synthesize → verify`

Every stage is env-toggleable for leave-one-out ablation. Techniques
folded in: HyDE, CoVe verification, iterative retrieval, FLARE active
retrieval, question classifier router, step critic (ThinkPRM pattern),
LongLLMLingua-lite compression, cross-encoder rerank
(`BAAI/bge-reranker-v2-m3`), Anthropic contextual chunking, W6 small-
model hardening (three-case synthesize prompt + per-chunk char cap).

### `core/rag/` — reusable retrieval primitives (v1 stable)

`HybridRetriever` (BM25 + dense + RRF) · `CrossEncoderReranker` ·
`contextualize_chunks` (Anthropic pattern) · `CorpusIndex` (bring-
your-own-PDFs). 5 exports, used by the engine and the archived
recipes.

### `archive/recipes/` — pre-engine reference recipes

`research-assistant`, `trading-copilot`, `document-qa`,
`rust-mcp-search-tool`. All still work; all tests still pass. The
`research-assistant/production/main.py` is a thin shim over
`engine.core.pipeline` so the cookbook framing is preserved.

---

## Domain presets

Six YAML files in `engine/domains/`:

| preset | when to use |
|---|---|
| `general` | default; anything |
| `medical` | disease / treatment / drug / trial (PubMed / Cochrane / NEJM bias; no prescriptive advice) |
| `papers` | academic CS / ML / physics / biology (arXiv + Semantic Scholar + OpenReview) |
| `financial` | SEC filings, earnings, company fundamentals (dates on every number) |
| `stock_trading` | technical + news per ticker — **hard rule: never recommends buy/sell/hold** |
| `personal_docs` | Q&A over your own corpus, air-gapped (only `corpus://` URLs allowed) |

Write your own in ~10 lines of YAML — see [`docs/domains.md`](docs/domains.md).

---

## Bring your own documents

```bash
python scripts/index_corpus.py build ~/papers --out ~/papers.idx
export LOCAL_CORPUS_PATH=~/papers.idx
engine ask "what do my papers say about contextual retrieval?" --domain personal_docs
```

Supported formats: PDF (via pypdf), Markdown, plain text, HTML (via
trafilatura). The index persists as a directory with a human-readable
`manifest.json` + a pickled `index.pkl`. Rebuild anytime the docs change.

Details: [`docs/self-learning.md`](docs/self-learning.md) covers the
trajectory + memory model; [`docs/plugins-skills.md`](docs/plugins-skills.md)
covers external plugins.

---

## MCP + Claude plugin

`engine/mcp/server.py` is a Python MCP server exposing:
- `research(question, domain?, memory?)` → structured `{answer, verified_claims, unverified_claims, sources, trace, totals, memory_hits}`
- `reset_memory()`
- `memory_count()`

Bundled Claude plugin at `engine/mcp/claude_plugin/` — four skills
(`/research`, `/cite-sources`, `/verify-claim`, `/set-domain`), ready to
submit to the Anthropic marketplace.

Register in Claude Desktop:

```jsonc
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "engine": {
      "command": "python",
      "args": ["-m", "engine.mcp.server"],
      "env": {
        "OPENAI_BASE_URL": "http://localhost:11434/v1",
        "OPENAI_API_KEY":  "ollama",
        "MODEL_SYNTHESIZER": "gemma3:4b",
        "SEARXNG_URL":    "http://localhost:8888"
      }
    }
  }
}
```

---

## Plugin / skill loader

Install third-party Claude plugins or Hermes (`agentskills.io`) skills:

```bash
engine plugins install gh:owner/some-research-plugin@v1
engine plugins install file:./my-local-plugin
engine plugins install https://example.com/marketplace.json
engine plugins list
engine plugins uninstall some-plugin
```

Safety: every install runs a forbidden-symbols scan
(`eval(`, `exec(`, `os.system(`, …) — rejects plugins that would
execute arbitrary code. Registry lives at
`~/.agentic-research/plugins/`, fully inspectable, wipable.

Full docs: [`docs/plugins-skills.md`](docs/plugins-skills.md).

---

## Architecture at a glance

```
                ┌─────────────┐
                │   question  │
                └──────┬──────┘
                       ▼
           ┌─────────────────────────┐   T4.3 router  — route by question type
           │  classify               │
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   T1 decompose · T2 HyDE · T4.1 critic
           │  plan                   │   T4.5 refine-on-reject
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   SearXNG parallel × N
           │  search                 │   + W5 local corpus (optional)
           │  (+ T4.1 critic)        │   + T4.1 coverage critic
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   T1 hybrid BM25 + dense + RRF
           │  retrieve               │   W4.1 cross-encoder rerank (opt-in)
           │  (+ W4.1 rerank)        │
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   W4.2 trafilatura clean-text
           │  fetch_url              │   skips corpus:// URLs
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   T4.4 LLM distillation
           │  compress               │   + W6.2 per-chunk char cap
           │  (+ W6.2 cap)           │
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   T2 synth · T4.2 FLARE on hedges
           │  synthesize             │   W6.1 three-case anti-hallucinate
           │  (+ FLARE + stream)     │   W7 streaming
           └──────────┬──────────────┘
                      ▼
           ┌─────────────────────────┐   T2 CoVe — decompose + verify
           │  verify                 │
           └────────┬────────────────┘
                    │
              verified? ── yes ──▶ END
                    │
                    no
                    │
           ◀────── re-search unverified claims ──── loop (bounded by MAX_ITERATIONS)
```

Every stage has an `ENABLE_*` flag so you can leave-one-out ablate.
Deep spec: [`docs/architecture.md`](docs/architecture.md).

---

## Repo layout

```
agentic-research-engine-oss/
├── engine/                        the flagship research engine
│   ├── core/                      pipeline · models · trace · memory
│   │   ├── pipeline.py              · compaction · domains · plugins
│   │   ├── models.py
│   │   ├── trace.py
│   │   ├── memory.py
│   │   ├── compaction.py
│   │   ├── domains.py
│   │   └── plugins.py
│   ├── interfaces/
│   │   ├── cli.py                 rich stdout CLI with subcommands
│   │   ├── tui.py                 Textual TUI
│   │   └── web/                   FastAPI + HTMX localhost GUI
│   ├── mcp/
│   │   ├── server.py              Python FastMCP server
│   │   └── claude_plugin/         submittable Claude plugin bundle
│   ├── domains/                   6 YAML presets
│   ├── examples/                  5 worked research examples
│   ├── benchmarks/                mini SimpleQA + BrowseComp fixtures + runner
│   └── tests/                     pytest suite (all mocked, zero-network)
├── core/rag/                      shared retrieval primitives (stable v

…

## Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

- **Author:** [TheAiSingularity](https://github.com/TheAiSingularity)
- **Source:** [TheAiSingularity/agentic-research-engine-oss](https://github.com/TheAiSingularity/agentic-research-engine-oss)
- **License:** MIT
- **Homepage:** https://github.com/TheAiSingularity/agentic-research-engine-oss

Install and usage instructions live in the source repository linked above.

## Pricing

- **Free** — Free

## Versions

- **0.1.2** — security scan: pending review — Imported from the upstream source.

## Links

- Listing page: https://agentstack.voostack.com/l/mcp-theaisingularity-agentic-research-engine-oss
- Seller: https://agentstack.voostack.com/s/theaisingularity
- Browse the marketplace: https://agentstack.voostack.com/browse

---
Listed on AgentStack — the marketplace for AI agent skills and MCP servers. Every listing is security-reviewed. Creators keep 70%.