# Code Index

> Быстрый индексатор кода для AI-моделей. Rust + tree-sitter + SQLite. Мгновенный поиск по символам.

- **Type:** MCP server
- **Install:** `agentstack add mcp-regsorm-code-index-mcp`
- **Verified:** Pending review
- **Seller:** [Regsorm](https://agentstack.voostack.com/s/regsorm)
- **Installs:** 0
- **Latest version:** 0.10.4
- **License:** MIT
- **Upstream author:** [Regsorm](https://github.com/Regsorm)
- **Source:** https://github.com/Regsorm/code-index-mcp

## Install

```sh
agentstack add mcp-regsorm-code-index-mcp
```

Requires the [AgentStack CLI](https://agentstack.voostack.com/docs/cli). Works with Claude Code, Cursor, and any MCP-compatible agent.

## About

Published on Infostart: [Code Index — структурный поиск по выгрузке кода 1С через MCP](https://infostart.ru/1c/tools/2677918/)

---

# code-index-mcp

[Русская версия](README_RU.md)

**Rust-native code index for AI agents. Static binary. Production-grade BSL/1C support.**

One static binary for Windows/Linux/macOS — no runtime, no dependencies. Indexes large repositories in seconds, returns results to AI agents over MCP in milliseconds. 31 tools: 20 universal + 11 BSL-specific for 1C:Enterprise configurations.

## What's inside

- **Performance.** 62,000 files indexed in 43 seconds, sub-ms search per query. Production-grade for 100K+ file monorepos.
- **31 MCP tools.** 20 universal (functions, classes, callers/callees, file content, grep) + 11 BSL-tools (object structure & profile, form handlers, event subscriptions, call graph, data links, register writers, impact map, read-only SQL).
- **Native BSL/1C.** Parses XML-exports of 1C:Enterprise 8.3 configurations. Data-link graph (object→object edges via reference types in attributes) — ~60,000 edges in seconds for a typical accounting configuration.
- **Federation.** One MCP server can serve multiple repositories across machines — pass `repo: "alias"` in each tool call.
- **Compressed content storage.** File contents stored in SQLite via zstd, cheap random-access reads for AI agents.
- **Tree-sitter AST.** 10 languages with full parsing (Rust, Python, JavaScript, TypeScript, Java, Kotlin, C#, Go, Objective-C, Zig) + fallback for 50+ formats.

Connects to Claude Code, Cursor, any MCP client over HTTP.

## Problem

AI models waste enormous time on repeated grep/find calls just to locate a single symbol. A real example: finding `RuntimeErrorProcessing` in a Java project required 14 sequential grep/find calls, each scanning thousands of files. With Code Index, that is one query returning results in under a millisecond.

## Solution

A compiled Rust binary with **one-writer / many-readers** architecture:

1. Parses source code into AST via tree-sitter
2. Indexes everything into SQLite with FTS5 full-text search
3. A separate **background daemon** is the sole writer: one process per machine watches a list of folders from its config and keeps `.code-index/index.db` up to date.
4. The **MCP server** is a thin **read-only** client: any number of Claude Code / VS Code / subagent sessions can connect to the same project in parallel — no pidlock conflicts, no per-session re-indexing.

## Supported Languages

| Language | Parser | Extensions |
|----------|--------|------------|
| Python | tree-sitter-python | `.py` |
| JavaScript | tree-sitter-javascript | `.js`, `.jsx` |
| TypeScript | tree-sitter-typescript | `.ts`, `.tsx` |
| Java | tree-sitter-java | `.java` |
| Rust | tree-sitter-rust | `.rs` |
| Go | tree-sitter-go | `.go` |
| 1C (BSL) | tree-sitter-onescript | `.bsl`, `.os` |
| XML (1C) | quick-xml | `.xml` (configuration metadata) |
| HTML | tree-sitter-html | `.html`, `.htm` (v0.7.1, by user request — see HTML-specific mapping below) |

Text files (`.md`, `.json`, `.yaml`, `.toml`, `.xml`, `.sql`, `.env`, etc.) are also indexed for full-text search.

### HTML — entity mapping (v0.7.2)

HTML has no native concept of "function" or "class", so the mapping is conventional. **Dual-indexing**: html files go through both AST parser AND `text_files` (so `search_text` / `grep_text` / `read_file` keep working alongside the new structural queries).

| HTML | → | code-index table | Name |
|------|---|------------------|------|
| `…` | → | `classes` | `X` (body=outerHTML, bases=tag_name) |
| `` | → | `classes` | `form_X` (bases=`form`) |
| `` without id/name | → | `classes` | `form_` |
| `` | → | `variables` | `Y` |
| `` | → | `imports` | `module=URL`, `kind="link"` |
| `` | → | `imports` | `module=URL`, `kind=X` (or `"stylesheet"`) |
| `` | → | `imports` | `module=URL`, `kind="script"` |
| `` | → | `imports` | `module=URL`, `kind=tag` |
| `…inline JS…` | → | `functions` | `inline_script_` (body=content) |
| `…inline CSS…` | → | `functions` | `inline_style_` (body=content) |
| Attribute `class="foo bar baz"` | → | `variables` | `class:foo`, `class:bar`, `class:baz` (one record per class) |

All MCP tools that work for HTML files after re-indexing:

```
# === Discovery & metadata ===
list_files(repo="X", pattern="**/*.html")                # all html (returns language="html")
list_files(repo="X", path_prefix="src/templates/")
stat_file(repo="X", path="src/templates/base.html")      # returns language="html", category="text"
get_stats(repo="X")                                       # totals

# === Structural (AST) — new in 0.7.x ===
# Elements with id, forms, css-classes, links, inline blocks → AST tables
get_class(repo="X", name="cart")                          # outerHTML of 
get_class(repo="X", name="form_login")                    # full 
search_class(repo="X", query="container", language="html")
get_function(repo="X", name="inline_script_42")           # body of  at line 42
search_function(repo="X", query="inline_script", language="html")
find_symbol(repo="X", name="form_login")                  # exact-name lookup across all 4 tables
find_symbol(repo="X", name="class:htmx-indicator")        # CSS class usage
get_imports(repo="X", module="https://unpkg.com/htmx.org@1.9.12")  # who depends on this CDN
get_file_summary(repo="X", path="src/templates/base.html")         # full map (functions/classes/imports/variables)

# === Body-level grep (works on inline_script bodies) ===
grep_body(repo="X", regex="fetch\\(", language="html")    # in  blocks
grep_body(repo="X", pattern="color:", language="html")    # in  blocks
grep_body(repo="X", regex="hx-target", language="html", path_glob="src/templates/**", context_lines=2)

# === Text-level (still works via dual-indexing) ===
read_file(repo="X", path="src/templates/base.html", line_start=1, line_end=20)
search_text(repo="X", query="DOCTYPE", language="html")
grep_text(repo="X", regex="\\{%\\s*include", path_glob="**/*.html", context_lines=1)  # Jinja includes
```

`get_callers` / `get_callees` are not populated for HTML (the parser does not extract call edges between scripts).

Template engines (Jinja/Django/EJS): `{{ … }}` and `{% … %}` are tolerated as text content; surrounding HTML elements are still parsed normally.

## Quick Start

### Install via npm (easiest)

```bash
npm install -g @regsorm/code-index-mcp
```

The `postinstall` step downloads the prebuilt native binary for your platform (Windows x64, Linux x64, macOS arm64) from GitHub Releases — nothing is compiled. Then run it as an MCP server:

```bash
npx @regsorm/code-index-mcp serve --path /path/to/your/repo
```

Also published to the [official MCP Registry](https://registry.modelcontextprotocol.io/) as `io.github.Regsorm/code-index`. This wrapper ships only the public `code-index` binary (no 1C support); for `bsl-indexer` build from source.

### Build from source

```bash
git clone https://github.com/Regsorm/code-index-mcp.git
cd code-index-mcp
cargo build --release -p code-index               # public binary for Python/Rust/Go/Java/JS/TS
cargo build --release -p bsl-indexer --features enrichment   # extra build with 1C support + LLM enrichment
```

Binaries:
* `target/release/code-index[.exe]` — main binary (no 1C support).
* `target/release/bsl-indexer[.exe]` — full 1C support (XML metadata parsers, BSL call graph, data-links graph, MCP tools `get_object_structure` / `get_form_handlers` / `find_path_bsl` / `search_terms` / `get_data_links` / `find_data_path` / `get_register_writers`, optional LLM enrichment under cargo feature `enrichment`).

GitHub Releases publish 6 ready artifacts per tag: `code-index` × {Win, Linux, macOS} + `bsl-indexer` × {Win, Linux, macOS}.

### Set up the background daemon (v0.5+)

Portable layout: one folder for everything (binary + config + runtime files). Pointed to by `CODE_INDEX_HOME` env var.

1. Create the daemon folder and drop `code-index.exe` into it (e.g. `C:\tools\code-index\`).

2. Set the `CODE_INDEX_HOME` environment variable to point at that folder:

   **Windows (persistent, user scope):**
   ```powershell
   setx CODE_INDEX_HOME "C:\tools\code-index"
   # Reopen your shell so the variable is visible.
   ```

   **Linux** — add to `~/.bashrc` or `~/.zshrc`:
   ```bash
   export CODE_INDEX_HOME="$HOME/.local/code-index"
   ```

   **macOS** — same as Linux for shells; for launchd agents use `launchctl setenv`.

   **Any OS — per-project fallback via `.mcp.json`** (no system env var needed):
   ```json
   {
     "mcpServers": {
       "code-index": {
         "command": "C:\\tools\\code-index\\code-index.exe",
         "args": ["serve", "--path", "."],
         "env": { "CODE_INDEX_HOME": "C:\\tools\\code-index" }
       }
     }
   }
   ```

3. Create `daemon.toml` inside that folder and list the paths to watch:

   ```toml
   [daemon]
   http_port = 0                  # 0 = pick free port automatically
   max_concurrent_initial = 1     # folders processed sequentially during initial indexing

   [[paths]]
   path = "C:\\RepoUT"

   [[paths]]
   path = "C:\\RepoBP_1"
   debounce_ms = 500              # per-folder override: react faster than the default 1500 ms
   batch_ms    = 1000
   ```

   Per-folder `debounce_ms` / `batch_ms` are **optional**. If omitted, the daemon falls back to `.code-index/config.json` inside that project, and then to built-in defaults (1500 ms / 2000 ms).

4. Start the daemon (foreground):

   ```bash
   code-index daemon run
   ```

   Or install it as a Windows Scheduled Task (auto-start at user logon; the script also sets `CODE_INDEX_HOME` via `setx`):

   ```powershell
   powershell -ExecutionPolicy Bypass -File scripts\install-daemon-autostart.ps1 `
     -BinaryPath "C:\tools\code-index\code-index.exe" `
     -CodeIndexHome "C:\tools\code-index" `
     -StartNow
   ```

5. Check status:

   ```bash
   code-index daemon status        # human-readable
   code-index daemon status --json # JSON
   code-index daemon reload        # re-read daemon.toml after edits
   code-index daemon stop
   ```

`CODE_INDEX_HOME` is **required** — there is no fallback. If it is unset, both `daemon` and `serve` exit with an error explaining how to set it.

> **Troubleshooting — "daemon not running / runtime-info missing" even though the daemon IS running.**
>
> The `serve` process and the daemon find each other only through `$CODE_INDEX_HOME/daemon.json`. If `serve` sees a different (or empty) `CODE_INDEX_HOME` than the daemon, it looks for `daemon.json` in the wrong place and reports the daemon as offline — while it is actually alive.
>
> The most common cause on Linux/macOS: **GUI MCP clients (VS Code, Continue, Cline) do not read `~/.bashrc` / `~/.zshrc`**, so a `serve` they launch with an empty `env` never sees the `CODE_INDEX_HOME` you exported in your shell. Meanwhile the daemon, started from a terminal, does — so they end up pointing at different folders.
>
> **Fix:** set `CODE_INDEX_HOME` explicitly in the `env` section of the client's MCP config, using the **same absolute path** the daemon uses (`$HOME` is not expanded there — use a real path). Restart the client and verify with `code-index daemon status`.

### One-shot indexing (no daemon)

```bash
code-index index /path/to/project
code-index stats --path /path/to/project --json
```

### Run as MCP server (read-only)

```bash
code-index serve --path /path/to/project
```

This is a thin read-only client of the daemon. It does not index anything itself — the daemon does. If the folder is still being indexed or not in `daemon.toml`, tools return a structured `{status, message, progress}` response instead of failing.

### Transports (stdio vs HTTP)

`serve` supports two transports:

| Transport | Process model | When to use |
|-----------|---------------|-------------|
| `stdio` (default) | One `serve` process per MCP session | Simple setups, single client, ad-hoc runs |
| `http` (streamable) | One shared `serve` process, many clients over `http://host:port/mcp` | Multi-project setups, supervisor-managed services, avoiding per-session CLI duplication |

```bash
# stdio — per-session, alias set at CLI
code-index serve --path ut=/repos/ut --path bp=/repos/bp

# HTTP — shared process, aliases come from daemon.toml
code-index serve --transport http --port 8011 --config /etc/code-index/daemon.toml
```

`--path` can be repeated in `alias=dir` form (multi-repo mode). Each tool call takes a `repo` parameter to select which repository to query. Without `=`, the single path uses `alias=default` (backward-compatible).

In HTTP mode, if `--config` is provided, aliases are taken from `[[paths]]` entries of `daemon.toml`: explicit `alias = "..."`, or derived from the path's last segment (lowercased, spaces → `_`) when not set. CLI `--path` takes precedence over the config file.

## Connecting to Claude Code

Add to `.mcp.json` in your project root. For `stdio`:

```json
{
  "mcpServers": {
    "code-index": {
      "command": "npx",
      "args": ["-y", "@regsorm/code-index-mcp", "serve", "--path", "."]
    }
  }
}
```

For a shared HTTP process:

```json
{
  "mcpServers": {
    "code-index": {
      "type": "http",
      "url": "http://127.0.0.1:8011/mcp"
    }
  }
}
```

## MCP Tools

| Tool | Description |
|------|-------------|
| `search_function` | Full-text search across functions (name, docstring, body) |
| `search_class` | Full-text search across classes |
| `get_function` | Get function by exact name |
| `get_class` | Get class by exact name |
| `get_callers` | Who calls this function? **(v0.35.0)** each row carries the caller's source `path` (distinguishes same-named callers from different files) |
| `get_callees` | What does this function call? **(v0.35.0)** each row carries the source `path` |
| `find_path` | **(v0.23.0)** Shortest path in the call graph between two functions `from`→`to` (iterative cycle-safe BFS over unique `calls` nodes, `max_depth=5`, any language). Returns path edges `[{caller, callee, line}]` |
| `get_call_tree` | **(v0.23.0)** Call tree from a `root` function up to `max_depth` (default 3). `direction`: `callees`/`down` (downstream) or `callers`/`up`. Flat edge list `[{caller, callee, line, depth, path}]` (**(v0.35.0)** `path` = source file of each edge) + nested `{name, children}` tree; `max_nodes` cap |
| `find_symbol` | Search everywhere (functions, classes, variables, imports) |
| `get_imports` | Imports by module or file |
| `get_file_summary` | Complete file map without reading source |
| `get_stats` | Index statistics |
| `search_text` | Full-text search across text files |
| `grep_body` | Substring or regex search in function/class bodies. Returns `match_lines` (first 3 line numbers) and `match_count` (total, if > 3). v0.7.0: optional `path_glob`, `context_lines` |
| `stat_file` | **(v0.7.0)** Metadata of a single file: exists, size, mtime, language, lines_total, content_hash, indexed_at, category (`text`/`code`). **(v0.8.0)** adds `oversize: bool` for code files |
| `list_files` | **(v0.7.0)** Flat file listing with optional `pattern` (glob like `**/*.py`), `path_prefix`, `language`, `limit` |
| `read_file` | **(v0.7.0)** Read content of a file. Optional `line_start`/`line_end` (1-based, inclusive). Soft-cap 5000 lines or 500 KB, hard-cap 2 MB. **(v0.8.0)** works for **code files** too (`.py`, `.bsl`, `.rs`, `.ts`, etc.) — content stored in `file_contents` table (zstd). Oversize files (default > 5 MB) return `oversize: true` with an empty `content` and a hint |
| `grep_text` | **(v0.7.0)** Regex search over text-file content (REGEXP). Closes the FTS5 special-character gap. Optional `path_glob`, `language`, `context_lines`. Hard-cap 1 MB on response size |
| `grep_code` | **(v0.8.0)** Regex search over **code-file** content (`.py`, `.bsl`, `.rs`, `.ts`, etc.) via `file_contents` table (zstd-decode in Rust). Same parameters as `grep_text`: `regex`, `path_glob?`, `language?`, `limit?`, `context_lines?`. Complements `grep

…

## Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

- **Author:** [Regsorm](https://github.com/Regsorm)
- **Source:** [Regsorm/code-index-mcp](https://github.com/Regsorm/code-index-mcp)
- **License:** MIT

Install and usage instructions live in the source repository linked above.

## Pricing

- **Free** — Free

## Versions

- **0.10.4** — security scan: pending review — Imported from the upstream source.

## Links

- Listing page: https://agentstack.voostack.com/l/mcp-regsorm-code-index-mcp
- Seller: https://agentstack.voostack.com/s/regsorm
- Browse the marketplace: https://agentstack.voostack.com/browse

---
Listed on AgentStack — the marketplace for AI agent skills and MCP servers. Every listing is security-reviewed. Creators keep 70%.
