AgentStack
MCP unreviewed MIT Self-run

Code Index

mcp-regsorm-code-index-mcp · by Regsorm

Быстрый индексатор кода для AI-моделей. Rust + tree-sitter + SQLite. Мгновенный поиск по символам.

No reviews yet
0 installs
0 views
view→install

Install

$ agentstack add mcp-regsorm-code-index-mcp

Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.

Are you the author of Code Index? Claim this listing to set pricing, connect Stripe payouts, and keep 70% of every sale.

About

Published on Infostart: Code Index — структурный поиск по выгрузке кода 1С через MCP


code-index-mcp

[Русская версия](README_RU.md)

Rust-native code index for AI agents. Static binary. Production-grade BSL/1C support.

One static binary for Windows/Linux/macOS — no runtime, no dependencies. Indexes large repositories in seconds, returns results to AI agents over MCP in milliseconds. 31 tools: 20 universal + 11 BSL-specific for 1C:Enterprise configurations.

What's inside

  • Performance. 62,000 files indexed in 43 seconds, sub-ms search per query. Production-grade for 100K+ file monorepos.
  • 31 MCP tools. 20 universal (functions, classes, callers/callees, file content, grep) + 11 BSL-tools (object structure & profile, form handlers, event subscriptions, call graph, data links, register writers, impact map, read-only SQL).
  • Native BSL/1C. Parses XML-exports of 1C:Enterprise 8.3 configurations. Data-link graph (object→object edges via reference types in attributes) — ~60,000 edges in seconds for a typical accounting configuration.
  • Federation. One MCP server can serve multiple repositories across machines — pass repo: "alias" in each tool call.
  • Compressed content storage. File contents stored in SQLite via zstd, cheap random-access reads for AI agents.
  • Tree-sitter AST. 10 languages with full parsing (Rust, Python, JavaScript, TypeScript, Java, Kotlin, C#, Go, Objective-C, Zig) + fallback for 50+ formats.

Connects to Claude Code, Cursor, any MCP client over HTTP.

Problem

AI models waste enormous time on repeated grep/find calls just to locate a single symbol. A real example: finding RuntimeErrorProcessing in a Java project required 14 sequential grep/find calls, each scanning thousands of files. With Code Index, that is one query returning results in under a millisecond.

Solution

A compiled Rust binary with one-writer / many-readers architecture:

  1. Parses source code into AST via tree-sitter
  2. Indexes everything into SQLite with FTS5 full-text search
  3. A separate background daemon is the sole writer: one process per machine watches a list of folders from its config and keeps .code-index/index.db up to date.
  4. The MCP server is a thin read-only client: any number of Claude Code / VS Code / subagent sessions can connect to the same project in parallel — no pidlock conflicts, no per-session re-indexing.

Supported Languages

| Language | Parser | Extensions | |----------|--------|------------| | Python | tree-sitter-python | .py | | JavaScript | tree-sitter-javascript | .js, .jsx | | TypeScript | tree-sitter-typescript | .ts, .tsx | | Java | tree-sitter-java | .java | | Rust | tree-sitter-rust | .rs | | Go | tree-sitter-go | .go | | 1C (BSL) | tree-sitter-onescript | .bsl, .os | | XML (1C) | quick-xml | .xml (configuration metadata) | | HTML | tree-sitter-html | .html, .htm (v0.7.1, by user request — see HTML-specific mapping below) |

Text files (.md, .json, .yaml, .toml, .xml, .sql, .env, etc.) are also indexed for full-text search.

HTML — entity mapping (v0.7.2)

HTML has no native concept of "function" or "class", so the mapping is conventional. Dual-indexing: html files go through both AST parser AND text_files (so search_text / grep_text / read_file keep working alongside the new structural queries).

| HTML | → | code-index table | Name | |------|---|------------------|------| | | → | classes | X (body=outerHTML, bases=tagname) | | ` | → | classes | formX (bases=form) | | without id/name | → | classes | form | | | → | variables | Y | | | → | imports | module=URL, kind="link" | | | → | imports | module=URL, kind=X (or "stylesheet") | | | → | imports | module=URL, kind="script" | | | → | imports | module=URL, kind=tag | | …inline JS… | → | functions | inlinescript (body=content) | | …inline CSS… | → | functions | inlinestyle_ (body=content) | | Attribute class="foo bar baz" | → | variables | class:foo, class:bar, class:baz` (one record per class) |

All MCP tools that work for HTML files after re-indexing:

# === Discovery & metadata ===
list_files(repo="X", pattern="**/*.html")                # all html (returns language="html")
list_files(repo="X", path_prefix="src/templates/")
stat_file(repo="X", path="src/templates/base.html")      # returns language="html", category="text"
get_stats(repo="X")                                       # totals

# === Structural (AST) — new in 0.7.x ===
# Elements with id, forms, css-classes, links, inline blocks → AST tables
get_class(repo="X", name="cart")                          # outerHTML of 
get_class(repo="X", name="form_login")                    # full 
search_class(repo="X", query="container", language="html")
get_function(repo="X", name="inline_script_42")           # body of  at line 42
search_function(repo="X", query="inline_script", language="html")
find_symbol(repo="X", name="form_login")                  # exact-name lookup across all 4 tables
find_symbol(repo="X", name="class:htmx-indicator")        # CSS class usage
get_imports(repo="X", module="https://unpkg.com/htmx.org@1.9.12")  # who depends on this CDN
get_file_summary(repo="X", path="src/templates/base.html")         # full map (functions/classes/imports/variables)

# === Body-level grep (works on inline_script bodies) ===
grep_body(repo="X", regex="fetch\\(", language="html")    # in  blocks
grep_body(repo="X", pattern="color:", language="html")    # in  blocks
grep_body(repo="X", regex="hx-target", language="html", path_glob="src/templates/**", context_lines=2)

# === Text-level (still works via dual-indexing) ===
read_file(repo="X", path="src/templates/base.html", line_start=1, line_end=20)
search_text(repo="X", query="DOCTYPE", language="html")
grep_text(repo="X", regex="\\{%\\s*include", path_glob="**/*.html", context_lines=1)  # Jinja includes

get_callers / get_callees are not populated for HTML (the parser does not extract call edges between scripts).

Template engines (Jinja/Django/EJS): {{ … }} and {% … %} are tolerated as text content; surrounding HTML elements are still parsed normally.

Quick Start

Install via npm (easiest)

npm install -g @regsorm/code-index-mcp

The postinstall step downloads the prebuilt native binary for your platform (Windows x64, Linux x64, macOS arm64) from GitHub Releases — nothing is compiled. Then run it as an MCP server:

npx @regsorm/code-index-mcp serve --path /path/to/your/repo

Also published to the official MCP Registry as io.github.Regsorm/code-index. This wrapper ships only the public code-index binary (no 1C support); for bsl-indexer build from source.

Build from source

git clone https://github.com/Regsorm/code-index-mcp.git
cd code-index-mcp
cargo build --release -p code-index               # public binary for Python/Rust/Go/Java/JS/TS
cargo build --release -p bsl-indexer --features enrichment   # extra build with 1C support + LLM enrichment

Binaries:

  • target/release/code-index[.exe] — main binary (no 1C support).
  • target/release/bsl-indexer[.exe] — full 1C support (XML metadata parsers, BSL call graph, data-links graph, MCP tools get_object_structure / get_form_handlers / find_path_bsl / search_terms / get_data_links / find_data_path / get_register_writers, optional LLM enrichment under cargo feature enrichment).

GitHub Releases publish 6 ready artifacts per tag: code-index × {Win, Linux, macOS} + bsl-indexer × {Win, Linux, macOS}.

Set up the background daemon (v0.5+)

Portable layout: one folder for everything (binary + config + runtime files). Pointed to by CODE_INDEX_HOME env var.

  1. Create the daemon folder and drop code-index.exe into it (e.g. C:\tools\code-index\).
  1. Set the CODE_INDEX_HOME environment variable to point at that folder:

Windows (persistent, user scope): ``powershell setx CODE_INDEX_HOME "C:\tools\code-index" # Reopen your shell so the variable is visible. ``

Linux — add to ~/.bashrc or ~/.zshrc: ``bash export CODE_INDEX_HOME="$HOME/.local/code-index" ``

macOS — same as Linux for shells; for launchd agents use launchctl setenv.

Any OS — per-project fallback via .mcp.json (no system env var needed): ``json { "mcpServers": { "code-index": { "command": "C:\\tools\\code-index\\code-index.exe", "args": ["serve", "--path", "."], "env": { "CODE_INDEX_HOME": "C:\\tools\\code-index" } } } } ``

  1. Create daemon.toml inside that folder and list the paths to watch:

```toml [daemon] httpport = 0 # 0 = pick free port automatically maxconcurrent_initial = 1 # folders processed sequentially during initial indexing

[[paths]] path = "C:\\RepoUT"

[[paths]] path = "C:\\RepoBP1" debouncems = 500 # per-folder override: react faster than the default 1500 ms batch_ms = 1000 ```

Per-folder debounce_ms / batch_ms are optional. If omitted, the daemon falls back to .code-index/config.json inside that project, and then to built-in defaults (1500 ms / 2000 ms).

  1. Start the daemon (foreground):

``bash code-index daemon run ``

Or install it as a Windows Scheduled Task (auto-start at user logon; the script also sets CODE_INDEX_HOME via setx):

``powershell powershell -ExecutionPolicy Bypass -File scripts\install-daemon-autostart.ps1 -BinaryPath "C:\tools\code-index\code-index.exe" -CodeIndexHome "C:\tools\code-index" -StartNow ```

  1. Check status:

``bash code-index daemon status # human-readable code-index daemon status --json # JSON code-index daemon reload # re-read daemon.toml after edits code-index daemon stop ``

CODE_INDEX_HOME is required — there is no fallback. If it is unset, both daemon and serve exit with an error explaining how to set it.

> Troubleshooting — "daemon not running / runtime-info missing" even though the daemon IS running. > > The serve process and the daemon find each other only through $CODE_INDEX_HOME/daemon.json. If serve sees a different (or empty) CODE_INDEX_HOME than the daemon, it looks for daemon.json in the wrong place and reports the daemon as offline — while it is actually alive. > > The most common cause on Linux/macOS: GUI MCP clients (VS Code, Continue, Cline) do not read ~/.bashrc / ~/.zshrc, so a serve they launch with an empty env never sees the CODE_INDEX_HOME you exported in your shell. Meanwhile the daemon, started from a terminal, does — so they end up pointing at different folders. > > Fix: set CODE_INDEX_HOME explicitly in the env section of the client's MCP config, using the same absolute path the daemon uses ($HOME is not expanded there — use a real path). Restart the client and verify with code-index daemon status.

One-shot indexing (no daemon)

code-index index /path/to/project
code-index stats --path /path/to/project --json

Run as MCP server (read-only)

code-index serve --path /path/to/project

This is a thin read-only client of the daemon. It does not index anything itself — the daemon does. If the folder is still being indexed or not in daemon.toml, tools return a structured {status, message, progress} response instead of failing.

Transports (stdio vs HTTP)

serve supports two transports:

| Transport | Process model | When to use | |-----------|---------------|-------------| | stdio (default) | One serve process per MCP session | Simple setups, single client, ad-hoc runs | | http (streamable) | One shared serve process, many clients over http://host:port/mcp | Multi-project setups, supervisor-managed services, avoiding per-session CLI duplication |

# stdio — per-session, alias set at CLI
code-index serve --path ut=/repos/ut --path bp=/repos/bp

# HTTP — shared process, aliases come from daemon.toml
code-index serve --transport http --port 8011 --config /etc/code-index/daemon.toml

--path can be repeated in alias=dir form (multi-repo mode). Each tool call takes a repo parameter to select which repository to query. Without =, the single path uses alias=default (backward-compatible).

In HTTP mode, if --config is provided, aliases are taken from [[paths]] entries of daemon.toml: explicit alias = "...", or derived from the path's last segment (lowercased, spaces → _) when not set. CLI --path takes precedence over the config file.

Connecting to Claude Code

Add to .mcp.json in your project root. For stdio:

{
  "mcpServers": {
    "code-index": {
      "command": "npx",
      "args": ["-y", "@regsorm/code-index-mcp", "serve", "--path", "."]
    }
  }
}

For a shared HTTP process:

{
  "mcpServers": {
    "code-index": {
      "type": "http",
      "url": "http://127.0.0.1:8011/mcp"
    }
  }
}

MCP Tools

| Tool | Description | |------|-------------| | search_function | Full-text search across functions (name, docstring, body) | | search_class | Full-text search across classes | | get_function | Get function by exact name | | get_class | Get class by exact name | | get_callers | Who calls this function? (v0.35.0) each row carries the caller's source path (distinguishes same-named callers from different files) | | get_callees | What does this function call? (v0.35.0) each row carries the source path | | find_path | (v0.23.0) Shortest path in the call graph between two functions fromto (iterative cycle-safe BFS over unique calls nodes, max_depth=5, any language). Returns path edges [{caller, callee, line}] | | get_call_tree | (v0.23.0) Call tree from a root function up to max_depth (default 3). direction: callees/down (downstream) or callers/up. Flat edge list [{caller, callee, line, depth, path}] ((v0.35.0) path = source file of each edge) + nested {name, children} tree; max_nodes cap | | find_symbol | Search everywhere (functions, classes, variables, imports) | | get_imports | Imports by module or file | | get_file_summary | Complete file map without reading source | | get_stats | Index statistics | | search_text | Full-text search across text files | | grep_body | Substring or regex search in function/class bodies. Returns match_lines (first 3 line numbers) and match_count (total, if > 3). v0.7.0: optional path_glob, context_lines | | stat_file | (v0.7.0) Metadata of a single file: exists, size, mtime, language, linestotal, contenthash, indexed_at, category (text/code). (v0.8.0) adds oversize: bool for code files | | list_files | (v0.7.0) Flat file listing with optional pattern (glob like **/*.py), path_prefix, language, limit | | read_file | (v0.7.0) Read content of a file. Optional line_start/line_end (1-based, inclusive). Soft-cap 5000 lines or 500 KB, hard-cap 2 MB. (v0.8.0) works for code files too (.py, .bsl, .rs, .ts, etc.) — content stored in file_contents table (zstd). Oversize files (default > 5 MB) return oversize: true with an empty content and a hint | | grep_text | (v0.7.0) Regex search over text-file content (REGEXP). Closes the FTS5 special-character gap. Optional path_glob, language, context_lines. Hard-cap 1 MB on response size | | grep_code | (v0.8.0) Regex search over code-file content (.py, .bsl, .rs, .ts, etc.) via file_contents table (zstd-decode in Rust). Same parameters as grep_text: regex, path_glob?, language?, limit?, context_lines?. Complements `grep

Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

Install and usage instructions live in the source repository linked above.

Reviews

No reviews yet — be the first.

Versions

  • v0.10.4 Imported from the upstream source.