AgentStack
MCP unreviewed MIT Self-run

Context Mem

mcp-jubakitiashvili-context-mem · by JubaKitiashvili

Persistent memory for AI agents — 98%+ retrieval recall, 99% token savings, 44 tools

No reviews yet
0 installs
0 views
view→install

Install

$ agentstack add mcp-jubakitiashvili-context-mem

Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.

Are you the author of Context Mem? Claim this listing to set pricing, connect Stripe payouts, and keep 70% of every sale.

About

Context Mem

Memory + context infrastructure for AI agents. Remembers everything. Compresses everything. Fully local.

[](https://www.npmjs.com/package/context-mem) [-gold)]() []() []() [](LICENSE)


The Problem

Two problems with today's AI tooling that no one has solved together in a single package.

Your AI forgets. Every new session starts from zero. The architecture decisions you settled on last Thursday, the bug you spent four hours tracing to a misconfigured environment variable, the preferences you stated three times — none of it carries forward. You spend the first ten minutes of every session re-explaining context that already existed. Multiply this by every developer on your team, every project, every day.

Your context explodes. Long coding sessions blow past the context window. A typical session with 50 tool outputs accumulates 365 KB of raw text — stack traces, test output, file reads, shell commands. Every token costs money or slows the model. Naive truncation drops the exact evidence the model needs. Keeping everything makes responses slower and inference cost climb fast.

These two problems compound each other. The solution to forgetting (keep everything) is the opposite of the solution to context explosion (discard everything). The result is a false tradeoff most tools force on you: either your AI forgets everything, or your costs balloon. context-mem solves both simultaneously by building an indexed, compressed, retrievable memory store rather than dumping raw history into the context window.


The Solution — one tool, two pillars

Pillar 1: Memory (LLM Wiki)

Every tool call is automatically ingested, summarized, and written into a navigable markdown vault — a living wiki your AI maintains about your project. Entities get their own pages with backlinks. Topics get synthesis pages. Sessions become browseable source documents. Decisions accumulate into a reconstructible trail.

The vault lives at .context-mem/vault/ and syncs continuously from the underlying SQLite store. Read it in Obsidian, grep it from the terminal, or query it through 45+ MCP tools using hybrid BM25 + vector + optional LLM judge search. The raw SQLite store is the authoritative record; the markdown vault is the derived, human-readable layer.

This is a reference implementation of Andrej Karpathy's LLM Wiki pattern — three layers (raw sources / wiki / schema), with automatic ingest from tool calls that no other system provides.

Pillar 2: Compression (14 summarizers)

Every observation passes through a content-aware summarizer before storage. A stack trace is not treated the same way as a JSON config file. Shell output from a build is compressed differently from TypeScript compiler errors. The system applies the right compression for the content type.

The result: a full coding session with 50 tool outputs goes from 365 KB to 3.2 KB — 99.1% token savings, verified. Compression is adaptive: recent high-importance observations stay verbatim; older low-importance ones compress progressively. Pinned entries never compress regardless of age.


One Command

npm i context-mem && npx context-mem init

init auto-detects your editor and writes the right config files:

| Editor | Config written | |---|---| | Claude Code | .mcp.json + 8 hooks + CLAUDE.md | | Cursor | .cursor/mcp.json + .cursor/rules/context-mem.mdc | | Windsurf | .windsurf/mcp.json + .windsurf/rules/context-mem.md | | VS Code / Copilot | .vscode/mcp.json + .github/copilot-instructions.md | | Cline | .cline/mcp_settings.json + .clinerules/context-mem.md | | Roo Code | .roo-code/mcp_settings.json + .roo/rules/context-mem.md | | Aider | .aider.conf.yml (MCP block) | | Continue | .continue/config.json (MCP block) | | JetBrains AI | .idea/mcp.json |

No API keys. No cloud account. No data leaves your machine.


Dual-pillar in 60 seconds

[ placeholder: GIF or video — Claude Code session with split view showing Obsidian graph updating in real time alongside the context-mem dashboard token savings chart ]


Architecture (reference implementation of Karpathy's LLM Wiki pattern)

                    ┌─────────────────────────────────────────┐
                    │            Raw Sources (immutable)       │
                    │  tool calls · observations · file reads  │
                    └──────────────────┬──────────────────────┘
                                       │
                                       ▼
                    ┌─────────────────────────────────────────┐
                    │           Observation Pipeline           │
                    │                                          │
                    │  PrivacyEngine (9 detectors)             │
                    │    → 14 content-aware summarizers        │
                    │    → entity extraction (100+ aliases)    │
                    │    → topic detection                     │
                    │    → importance scoring (0.0–1.0)        │
                    │    → adaptive compression tier           │
                    └────────────────┬────────────────────────┘
                                     │
                   ┌─────────────────┴───────────────────┐
                   │                                     │
                   ▼                                     ▼
    ┌──────────────────────────┐       ┌─────────────────────────────┐
    │    SQLite (primary)      │       │   Markdown Vault (derived)  │
    │                          │       │                             │
    │  observations            │──────▶│  .context-mem/vault/        │
    │  entities + graph        │  sync │    index.md                 │
    │  knowledge               │       │    log.md                   │
    │  events                  │       │    sources/.md     │
    │  FTS5 index              │       │    entities/.md       │
    │  vector embeddings       │       │    topics/.md         │
    └──────────────────────────┘       │    knowledge/.md        │
                   │                   └─────────────────────────────┘
                   │
                   ▼
    ┌──────────────────────────────────────────────────────────────┐
    │              Hybrid Retrieval                                │
    │                                                              │
    │  BM25 (8 strategies + synonym expansion)                     │
    │  + Vector (nomic-embed-text-v1.5, 768-dim)                   │
    │  + Trigram + Levenshtein                                     │
    │  → Fusion (intent-adaptive weights, IDF reranker)            │
    │  → Optional LLM judge (Haiku, 50/50 blend, 100% R@5)        │
    └──────────────────────────────────────────────────────────────┘

Three layers (per Karpathy):

  • Raw sources — your tool call outputs, file reads, shell commands, observations. Written once, never modified. The permanent record.
  • The wiki — LLM-maintained markdown vault (.context-mem/vault/). Auto-synced from SQLite. Human-readable, Obsidian-compatible, grep-friendly. Entity pages, topic pages, session pages, knowledge pages, index, event log.
  • Schema — [docs/llm-wiki-schema.md](llm-wiki-schema.md) governs page structure, linking conventions, agent workflow recipes, and interop contract. Public spec — other tools can emit conforming wikis.

The distinction from most memory systems: context-mem is not replacing SQLite with markdown. SQLite is authoritative — it is where observations are stored, searched, and indexed. The vault is the browseable, linkable, diffable surface on top of it — the layer a human or LLM can navigate without a database client. If you delete the vault directory, you lose nothing that matters. If you edit a vault page manually, those edits are preserved and not overwritten on the next sync.

This is the Karpathy three-layer model applied to a running AI development environment: immutable inputs, a maintained synthesis layer, and a public schema that governs the synthesis. The vault can be used independently of the MCP tools — it is just a directory of markdown files. Open it in any editor. Put it in git. Diff it across commits. Use it as long-form context by copy-pasting pages into a new conversation. The MCP tools are the automated path; the markdown vault is the portable, durable, human-readable path.


Retrieval benchmarks (honest methodology)

All scores are session-level retrieval recall: did any correct evidence session appear in the top-k results? This is different from end-to-end QA accuracy (retrieve + generate + judge), which is harder and lower for every system. Both measurements are published here.

Pure local (zero API calls, fully free)

| Benchmark | Retrieval Recall | E2E QA Accuracy | Questions | Sessions | |---|---|---|---|---| | LongMemEval | 97.8% R@5 | published post-v3.4 | 500 | ~53/conv | | LoCoMo | 98.1% R@10 | published post-v3.4 | 1,977 | 19-35/conv | | MemBench | 98.0% R@5 | — | 500 | — | | ConvoMem | 97.7% R@10 | — | 250 | — |

With optional LLM reranking (~$1 per 500 queries)

| Benchmark | Retrieval Recall | |---|---| | LongMemEval | 100.0% R@5 (500/500) |

The LLM judge (Claude Haiku) scores the top-N BM25+vector candidates 0–10 and blends 50/50 with the retrieval score. Activates when ai_curation.enabled = true. Adds ~$0.002 per query at Haiku pricing.

Methodology notes:

  • A "hit" is scored if any correct evidence session appears in top-k. Not end-to-end QA.
  • LoCoMo benchmark appends dataset-provided metadata (sessionsummary, observation, eventsummary) to session documents — the production system applies equivalent enrichment via summarizers and entity extraction.
  • Synonym expansions: core query-builder includes general-vocabulary synonyms (movie → film, sibling → brother). Results without any synonym expansion are ~1-2% lower.
  • All benchmark code is open and runnable: npm run bench. See [benchmarks/](../benchmarks/).

Full methodology: [docs/benchmarks/methodology.md](benchmarks/methodology.md) (published with v3.4).


Compression benchmarks (verified)

| Scenario | Raw | Compressed | Savings | |---|---|---|---| | Typical coding session (50 tool outputs) | 365 KB | 3.2 KB | 99.1% |

Per-summarizer breakdown:

| Summarizer | Compression ratio | |---|---| | Log output | 97% | | Errors | 95% | | Shell / CLI | ~95% | | Code | 92% | | JSON | 89% | | TS compiler errors | ~88% | | Tests | ~85% | | Build output | ~94% | | Git logs | ~90% | | HTML | ~92% | | Markdown | ~75% | | CSV | ~80% | | Network responses | ~88% | | Binary (hex dumps) | ~98% |

Compression is lossless at the semantic level for high-importance observations (DECISION, MILESTONE, PROBLEM flags) — those stay verbatim regardless of age. Compression applies to routine tool output.


Core features

Memory

  • LLM Wiki substrate — markdown vault at .context-mem/vault/, auto-synced from SQLite. Entity pages, topic pages, session source pages, knowledge pages, index.md, log.md. Obsidian-compatible, grep-friendly.
  • 14 content-aware summarizers — JSON, shell, code, logs, errors, TS errors, tests, builds, git logs, HTML, markdown, CSV, binary, network. Each tuned for its content type.
  • Adaptive 4-tier compression — verbatim (0–7 days) → light (7–30 days) → medium (30–90 days) → distilled (90 days+). Pinned entries stay verbatim forever.
  • Knowledge graph — typed entity-relationship model: files, modules, patterns, decisions, bugs, people, libraries, services, APIs, configs. Traversable via graph_query, graph_neighbors, add_relationship.
  • Temporal factsvalid_from/valid_to on all knowledge entries. Supersession chains. temporal_query answers "what was true about X at time T?"
  • Decision trail reconstructionexplain_decision walks the evidence chain backward: file reads → errors → searches → the decision. Full provenance.
  • Entity intelligence — auto-detect technologies, people, file paths, CamelCase identifiers, ALL_CAPS constants. 100+ canonical aliases (React.js → React, Node → Node.js, etc.).
  • Session narratives — 4 ready-made templates: PR description, standup update, ADR, onboarding guide. context-mem story --format pr.
  • Wake-up primer — token-budgeted context injection at session start. 4 layers: project profile (15%), critical knowledge (40%), recent decisions (30%), top entities (15%).
  • Per-prompt injection — UserPromptSubmit hook auto-injects relevant memories on every message. Rate-limited, topic-deduplicated. Zero manual commands.

Compression

  • 14 content-aware summarizers — not one-size-fits-all. A stack trace gets different treatment than a JSON response.
  • Pinned verbatim preservation — decisions, milestones, and manually-pinned observations never compress.
  • Priority-tiered truncation cascade — if the context budget is exceeded, lower-importance items are compressed first. High-importance items survive.
  • Configurable token budget — three overflow strategies: compress oldest, compress lowest-importance, or hard truncate.
  • 365 KB → 3.2 KB — verified on a typical 50-tool-output coding session.

Both

  • Hybrid search — BM25 (8 strategies + synonym expansion) + vector (nomic-embed-text-v1.5, 768-dim) + trigram + Levenshtein run in parallel, fused via intent-adaptive weights with IDF-weighted content reranking. Optional LLM judge reranker.
  • Temporal resolver — deterministic parsing for relative date queries ("3 days ago", "last Saturday", "last week"). Zero LLM cost. Returns absolute date range with confidence level.
  • 45+ MCP tools — observe, search, recall, ask, timeline, knowledge graph, entity detection, temporal query, session handoff, multi-agent coordination, token budget, dashboard, diagnostics, and more.
  • Fully local, zero cloud — SQLite on your machine. No telemetry. No API keys required for core functionality.
  • 9-detector privacy engine — strips `` tags, applies custom regex redactions, detects API keys, tokens, passwords, PII patterns. Nothing sensitive leaves your machine.
  • Sub-millisecond operations — importance classification at 556K ops/s, entity extraction at 179K ops/s, BM25 search at 3.3K ops/s, all local.

How it compares

The memory space has multiple incumbents. The context-compression space has a few more. No other tool addresses both axes together.

| | context-mem v4 | Mem0 | Graphiti | Zep | Letta | |---|---|---|---|---|---| | LLM Wiki / markdown vault | ✅ | ❌ | ❌ | ❌ | ❌ | | Auto-ingest from tool calls | ✅ | ❌ | ❌ | ❌ | ❌ | | Retrieval recall (local) | 97.8–98.1% R@k | not published | not published | not published | not published | | Token compression | 99.1% | ❌ | ❌ | ❌ | partial | | Typed knowledge graph | ✅ | ✅ | ✅ | partial | partial | | Temporal graph queries | ✅ | ✅ | ✅ | ❌ | ❌ | | Hybrid BM25 + vector + LLM rerank | ✅ | partial | ❌ | partial | ❌ | | Fully local (no cloud required) | ✅ | ❌ | ❌ | ❌ | ❌ | | Decision trail reconstruction | ✅ | ❌ | ❌ | ❌ | ❌ | | Obsidian-compatible output | ✅ | ❌ | ❌ | ❌ | ❌ | | MCP tools | 45+ | some | some | some | some | | License | MIT | Apache/cloud | Apache | Apache | Apache |

Notes on this table: Retrieval recall figures for Mem0, Graphiti, Zep, and Letta are not published against the same benchmarks (LongMemEval, LoCoMo, MemBench, ConvoMem) at session-level retrieval recall using a methodology comparable to ours. If published numbers exist in their docs, they are for different datasets, different granularity (chunk-level vs. session-level), or with undisclosed infrastructure. Do not compare them directly. E2E QA numbers for context-mem will be published with

Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

Install and usage instructions live in the source repository linked above.

Reviews

No reviews yet — be the first.

Versions

  • v3.2.1 Imported from the upstream source.