Install
$ agentstack add mcp-regsorm-code-index-mcp Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.
About
Published on Infostart: Code Index — структурный поиск по выгрузке кода 1С через MCP
code-index-mcp
[Русская версия](README_RU.md)
Rust-native code index for AI agents. Static binary. Production-grade BSL/1C support.
One static binary for Windows/Linux/macOS — no runtime, no dependencies. Indexes large repositories in seconds, returns results to AI agents over MCP in milliseconds. 31 tools: 20 universal + 11 BSL-specific for 1C:Enterprise configurations.
What's inside
- Performance. 62,000 files indexed in 43 seconds, sub-ms search per query. Production-grade for 100K+ file monorepos.
- 31 MCP tools. 20 universal (functions, classes, callers/callees, file content, grep) + 11 BSL-tools (object structure & profile, form handlers, event subscriptions, call graph, data links, register writers, impact map, read-only SQL).
- Native BSL/1C. Parses XML-exports of 1C:Enterprise 8.3 configurations. Data-link graph (object→object edges via reference types in attributes) — ~60,000 edges in seconds for a typical accounting configuration.
- Federation. One MCP server can serve multiple repositories across machines — pass
repo: "alias"in each tool call. - Compressed content storage. File contents stored in SQLite via zstd, cheap random-access reads for AI agents.
- Tree-sitter AST. 10 languages with full parsing (Rust, Python, JavaScript, TypeScript, Java, Kotlin, C#, Go, Objective-C, Zig) + fallback for 50+ formats.
Connects to Claude Code, Cursor, any MCP client over HTTP.
Problem
AI models waste enormous time on repeated grep/find calls just to locate a single symbol. A real example: finding RuntimeErrorProcessing in a Java project required 14 sequential grep/find calls, each scanning thousands of files. With Code Index, that is one query returning results in under a millisecond.
Solution
A compiled Rust binary with one-writer / many-readers architecture:
- Parses source code into AST via tree-sitter
- Indexes everything into SQLite with FTS5 full-text search
- A separate background daemon is the sole writer: one process per machine watches a list of folders from its config and keeps
.code-index/index.dbup to date. - The MCP server is a thin read-only client: any number of Claude Code / VS Code / subagent sessions can connect to the same project in parallel — no pidlock conflicts, no per-session re-indexing.
Supported Languages
| Language | Parser | Extensions | |----------|--------|------------| | Python | tree-sitter-python | .py | | JavaScript | tree-sitter-javascript | .js, .jsx | | TypeScript | tree-sitter-typescript | .ts, .tsx | | Java | tree-sitter-java | .java | | Rust | tree-sitter-rust | .rs | | Go | tree-sitter-go | .go | | 1C (BSL) | tree-sitter-onescript | .bsl, .os | | XML (1C) | quick-xml | .xml (configuration metadata) | | HTML | tree-sitter-html | .html, .htm (v0.7.1, by user request — see HTML-specific mapping below) |
Text files (.md, .json, .yaml, .toml, .xml, .sql, .env, etc.) are also indexed for full-text search.
HTML — entity mapping (v0.7.2)
HTML has no native concept of "function" or "class", so the mapping is conventional. Dual-indexing: html files go through both AST parser AND text_files (so search_text / grep_text / read_file keep working alongside the new structural queries).
| HTML | → | code-index table | Name | |------|---|------------------|------| | … | → | classes | X (body=outerHTML, bases=tagname) | | ` | → | classes | formX (bases=form) | | without id/name | → | classes | form | | | → | variables | Y | | | → | imports | module=URL, kind="link" | | | → | imports | module=URL, kind=X (or "stylesheet") | | | → | imports | module=URL, kind="script" | | | → | imports | module=URL, kind=tag | | …inline JS… | → | functions | inlinescript (body=content) | | …inline CSS… | → | functions | inlinestyle_ (body=content) | | Attribute class="foo bar baz" | → | variables | class:foo, class:bar, class:baz` (one record per class) |
All MCP tools that work for HTML files after re-indexing:
# === Discovery & metadata ===
list_files(repo="X", pattern="**/*.html") # all html (returns language="html")
list_files(repo="X", path_prefix="src/templates/")
stat_file(repo="X", path="src/templates/base.html") # returns language="html", category="text"
get_stats(repo="X") # totals
# === Structural (AST) — new in 0.7.x ===
# Elements with id, forms, css-classes, links, inline blocks → AST tables
get_class(repo="X", name="cart") # outerHTML of
get_class(repo="X", name="form_login") # full
search_class(repo="X", query="container", language="html")
get_function(repo="X", name="inline_script_42") # body of at line 42
search_function(repo="X", query="inline_script", language="html")
find_symbol(repo="X", name="form_login") # exact-name lookup across all 4 tables
find_symbol(repo="X", name="class:htmx-indicator") # CSS class usage
get_imports(repo="X", module="https://unpkg.com/htmx.org@1.9.12") # who depends on this CDN
get_file_summary(repo="X", path="src/templates/base.html") # full map (functions/classes/imports/variables)
# === Body-level grep (works on inline_script bodies) ===
grep_body(repo="X", regex="fetch\\(", language="html") # in blocks
grep_body(repo="X", pattern="color:", language="html") # in blocks
grep_body(repo="X", regex="hx-target", language="html", path_glob="src/templates/**", context_lines=2)
# === Text-level (still works via dual-indexing) ===
read_file(repo="X", path="src/templates/base.html", line_start=1, line_end=20)
search_text(repo="X", query="DOCTYPE", language="html")
grep_text(repo="X", regex="\\{%\\s*include", path_glob="**/*.html", context_lines=1) # Jinja includes
get_callers / get_callees are not populated for HTML (the parser does not extract call edges between scripts).
Template engines (Jinja/Django/EJS): {{ … }} and {% … %} are tolerated as text content; surrounding HTML elements are still parsed normally.
Quick Start
Install via npm (easiest)
npm install -g @regsorm/code-index-mcp
The postinstall step downloads the prebuilt native binary for your platform (Windows x64, Linux x64, macOS arm64) from GitHub Releases — nothing is compiled. Then run it as an MCP server:
npx @regsorm/code-index-mcp serve --path /path/to/your/repo
Also published to the official MCP Registry as io.github.Regsorm/code-index. This wrapper ships only the public code-index binary (no 1C support); for bsl-indexer build from source.
Build from source
git clone https://github.com/Regsorm/code-index-mcp.git
cd code-index-mcp
cargo build --release -p code-index # public binary for Python/Rust/Go/Java/JS/TS
cargo build --release -p bsl-indexer --features enrichment # extra build with 1C support + LLM enrichment
Binaries:
target/release/code-index[.exe]— main binary (no 1C support).target/release/bsl-indexer[.exe]— full 1C support (XML metadata parsers, BSL call graph, data-links graph, MCP toolsget_object_structure/get_form_handlers/find_path_bsl/search_terms/get_data_links/find_data_path/get_register_writers, optional LLM enrichment under cargo featureenrichment).
GitHub Releases publish 6 ready artifacts per tag: code-index × {Win, Linux, macOS} + bsl-indexer × {Win, Linux, macOS}.
Set up the background daemon (v0.5+)
Portable layout: one folder for everything (binary + config + runtime files). Pointed to by CODE_INDEX_HOME env var.
- Create the daemon folder and drop
code-index.exeinto it (e.g.C:\tools\code-index\).
- Set the
CODE_INDEX_HOMEenvironment variable to point at that folder:
Windows (persistent, user scope): ``powershell setx CODE_INDEX_HOME "C:\tools\code-index" # Reopen your shell so the variable is visible. ``
Linux — add to ~/.bashrc or ~/.zshrc: ``bash export CODE_INDEX_HOME="$HOME/.local/code-index" ``
macOS — same as Linux for shells; for launchd agents use launchctl setenv.
Any OS — per-project fallback via .mcp.json (no system env var needed): ``json { "mcpServers": { "code-index": { "command": "C:\\tools\\code-index\\code-index.exe", "args": ["serve", "--path", "."], "env": { "CODE_INDEX_HOME": "C:\\tools\\code-index" } } } } ``
- Create
daemon.tomlinside that folder and list the paths to watch:
```toml [daemon] httpport = 0 # 0 = pick free port automatically maxconcurrent_initial = 1 # folders processed sequentially during initial indexing
[[paths]] path = "C:\\RepoUT"
[[paths]] path = "C:\\RepoBP1" debouncems = 500 # per-folder override: react faster than the default 1500 ms batch_ms = 1000 ```
Per-folder debounce_ms / batch_ms are optional. If omitted, the daemon falls back to .code-index/config.json inside that project, and then to built-in defaults (1500 ms / 2000 ms).
- Start the daemon (foreground):
``bash code-index daemon run ``
Or install it as a Windows Scheduled Task (auto-start at user logon; the script also sets CODE_INDEX_HOME via setx):
``powershell powershell -ExecutionPolicy Bypass -File scripts\install-daemon-autostart.ps1 -BinaryPath "C:\tools\code-index\code-index.exe" -CodeIndexHome "C:\tools\code-index" -StartNow ```
- Check status:
``bash code-index daemon status # human-readable code-index daemon status --json # JSON code-index daemon reload # re-read daemon.toml after edits code-index daemon stop ``
CODE_INDEX_HOME is required — there is no fallback. If it is unset, both daemon and serve exit with an error explaining how to set it.
> Troubleshooting — "daemon not running / runtime-info missing" even though the daemon IS running. > > The serve process and the daemon find each other only through $CODE_INDEX_HOME/daemon.json. If serve sees a different (or empty) CODE_INDEX_HOME than the daemon, it looks for daemon.json in the wrong place and reports the daemon as offline — while it is actually alive. > > The most common cause on Linux/macOS: GUI MCP clients (VS Code, Continue, Cline) do not read ~/.bashrc / ~/.zshrc, so a serve they launch with an empty env never sees the CODE_INDEX_HOME you exported in your shell. Meanwhile the daemon, started from a terminal, does — so they end up pointing at different folders. > > Fix: set CODE_INDEX_HOME explicitly in the env section of the client's MCP config, using the same absolute path the daemon uses ($HOME is not expanded there — use a real path). Restart the client and verify with code-index daemon status.
One-shot indexing (no daemon)
code-index index /path/to/project
code-index stats --path /path/to/project --json
Run as MCP server (read-only)
code-index serve --path /path/to/project
This is a thin read-only client of the daemon. It does not index anything itself — the daemon does. If the folder is still being indexed or not in daemon.toml, tools return a structured {status, message, progress} response instead of failing.
Transports (stdio vs HTTP)
serve supports two transports:
| Transport | Process model | When to use | |-----------|---------------|-------------| | stdio (default) | One serve process per MCP session | Simple setups, single client, ad-hoc runs | | http (streamable) | One shared serve process, many clients over http://host:port/mcp | Multi-project setups, supervisor-managed services, avoiding per-session CLI duplication |
# stdio — per-session, alias set at CLI
code-index serve --path ut=/repos/ut --path bp=/repos/bp
# HTTP — shared process, aliases come from daemon.toml
code-index serve --transport http --port 8011 --config /etc/code-index/daemon.toml
--path can be repeated in alias=dir form (multi-repo mode). Each tool call takes a repo parameter to select which repository to query. Without =, the single path uses alias=default (backward-compatible).
In HTTP mode, if --config is provided, aliases are taken from [[paths]] entries of daemon.toml: explicit alias = "...", or derived from the path's last segment (lowercased, spaces → _) when not set. CLI --path takes precedence over the config file.
Connecting to Claude Code
Add to .mcp.json in your project root. For stdio:
{
"mcpServers": {
"code-index": {
"command": "npx",
"args": ["-y", "@regsorm/code-index-mcp", "serve", "--path", "."]
}
}
}
For a shared HTTP process:
{
"mcpServers": {
"code-index": {
"type": "http",
"url": "http://127.0.0.1:8011/mcp"
}
}
}
MCP Tools
| Tool | Description | |------|-------------| | search_function | Full-text search across functions (name, docstring, body) | | search_class | Full-text search across classes | | get_function | Get function by exact name | | get_class | Get class by exact name | | get_callers | Who calls this function? (v0.35.0) each row carries the caller's source path (distinguishes same-named callers from different files) | | get_callees | What does this function call? (v0.35.0) each row carries the source path | | find_path | (v0.23.0) Shortest path in the call graph between two functions from→to (iterative cycle-safe BFS over unique calls nodes, max_depth=5, any language). Returns path edges [{caller, callee, line}] | | get_call_tree | (v0.23.0) Call tree from a root function up to max_depth (default 3). direction: callees/down (downstream) or callers/up. Flat edge list [{caller, callee, line, depth, path}] ((v0.35.0) path = source file of each edge) + nested {name, children} tree; max_nodes cap | | find_symbol | Search everywhere (functions, classes, variables, imports) | | get_imports | Imports by module or file | | get_file_summary | Complete file map without reading source | | get_stats | Index statistics | | search_text | Full-text search across text files | | grep_body | Substring or regex search in function/class bodies. Returns match_lines (first 3 line numbers) and match_count (total, if > 3). v0.7.0: optional path_glob, context_lines | | stat_file | (v0.7.0) Metadata of a single file: exists, size, mtime, language, linestotal, contenthash, indexed_at, category (text/code). (v0.8.0) adds oversize: bool for code files | | list_files | (v0.7.0) Flat file listing with optional pattern (glob like **/*.py), path_prefix, language, limit | | read_file | (v0.7.0) Read content of a file. Optional line_start/line_end (1-based, inclusive). Soft-cap 5000 lines or 500 KB, hard-cap 2 MB. (v0.8.0) works for code files too (.py, .bsl, .rs, .ts, etc.) — content stored in file_contents table (zstd). Oversize files (default > 5 MB) return oversize: true with an empty content and a hint | | grep_text | (v0.7.0) Regex search over text-file content (REGEXP). Closes the FTS5 special-character gap. Optional path_glob, language, context_lines. Hard-cap 1 MB on response size | | grep_code | (v0.8.0) Regex search over code-file content (.py, .bsl, .rs, .ts, etc.) via file_contents table (zstd-decode in Rust). Same parameters as grep_text: regex, path_glob?, language?, limit?, context_lines?. Complements `grep
…
Source & license
This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.
- Author: Regsorm
- Source: Regsorm/code-index-mcp
- License: MIT
Install and usage instructions live in the source repository linked above.
Reviews
No reviews yet — be the first.
Write a review
Versions
- v0.10.4 Imported from the upstream source.