Install
$ agentstack add mcp-muse-code-space-test-genie-mcp Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.
About
test-genie-mcp
Built for vibe coders: one command, get a prioritized list of what's actually broken about your project.
Self-healing test automation for iOS, Android, Flutter, React Native and Web apps — as an MCP server.
[](https://www.npmjs.com/package/test-genie-mcp) [](https://github.com/MUSE-CODE-SPACE/test-genie-mcp/actions) [](https://opensource.org/licenses/MIT) [](https://modelcontextprotocol.io)
> v3.1.1 — vibe-check + honest auto-fix. One MCP call, ~30 seconds: race conditions + security issues + memory leaks + logic errors + perf smells, prioritized. Stays on your machine, no telemetry. Pass autoFix: true for the small, safe mechanical fixes (weak-hash, simple Math.random assignment) — backup + syntax-validate + rollback-on-syntax-fail. For test-verified application of harder fixes, use [v3.0.0](#how-the-iterate-fix-loop-works)'s iterate-fix loop.
Vibe coders quickstart
You don't read the docs. You open the project, talk to Claude, and want a verdict. Here it is:
In Claude (with test-genie-mcp installed — [setup](#5-minute-quickstart)):
/vibe-check /Users/me/my-app
Claude calls diagnose_project under the hood. ~30 seconds later you see:
# vibe-check report
- Project: /Users/me/my-app
- Platform: web
- Findings: 11 total — 4 critical, 4 high, 1 medium, 1 low
- Estimated fix time: ~85 min
## Top 5 issues
### 1. [CRIT] Hardcoded AWS access key id found in source
- File: `server.js:7`
- Category: security / secret (CWE-798)
- Confidence: 95%
- Fix: Move the value to an env var, gitignore the config, rotate the leaked key.
### 2. [CRIT] SQL string built by concatenating user input
- File: `server.js:21`
- Category: security / injection (CWE-89)
- Fix: Use parameterized queries (`db.query("... WHERE id = ?", [id])`).
### 3. [HIGH] useState setter called after await without mount guard
- File: `UserProfile.tsx:16`
- Category: race-condition / react-setstate-after-await (CWE-362)
- Confidence: 78%
- Fix: Use AbortController and check signal.aborted before calling setters.
… (top 5 shown — full list at output: "detailed")
## Next steps
1. Address the critical / high findings above.
2. Re-run diagnose_project after fixing to confirm convergence.
3. Use run_iterative_fix_loop for test-driven verification of each fix.
If any finding is autoFixable: true and is at high/critical severity, the diagnose_project call accepts autoFix: true to apply the mechanical replacement directly (with backup + syntax validation — see [SAFETY.md](SAFETY.md) for the exact guards). The v3.1.1 honest scope is narrow: weak hash (createHash('md5'|'sha1') → createHash('sha256')) and standalone Math.random() in security-sensitive files. For broader/structural fixes (race conditions, eval, exec injection) run run_iterative_fix_loop separately — it re-runs tests and auto-rolls-back on regression.
Why test-genie?
The bottleneck in mobile + cross-platform test automation isn't writing tests — it's the loop between a failing test and a passing test. test-genie closes that loop:
failing test → analyzer flags issue → fix proposed → dry-run + syntax check →
applied with backup → affected tests re-run → regression check → loop or stop
This full loop is the run_iterative_fix_loop tool. The diagnose_project autoFix: true path in v3.1.1 covers a strict subset — backup + dry-run + syntax-validate + apply, without re-running tests (so no test-regression rollback in that path). Use the right tool for the job — and see [SAFETY.md](SAFETY.md) for the exact guards on each.
Other tools (Detox, Maestro, Playwright, xcodebuild test) run tests. test-genie **runs tests and drives the fix until the bar is met or it can no longer make progress** — without you scrubbing through stack traces.
5-minute Quickstart
# 1. Install
npm install -g test-genie-mcp
# 2. Add to Claude Desktop config (~/.config/claude/claude_desktop_config.json)
{
"mcpServers": {
"test-genie": {
"command": "npx",
"args": ["test-genie-mcp"],
"env": {
"TEST_GENIE_ALLOWED_ROOT": "/path/to/your/project"
}
}
}
}
# 3. Restart Claude Desktop. From a chat:
# "Run the iterate-fix loop on /Users/me/my-rn-app with autoApply=false"
Expected output (truncated):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Iterative fix loop f8b3… — PAUSED-FOR-CONFIRMATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Iterations completed: 1
Fixes applied: 0
Regressions rolled back: 0
Final tests: 7/10 passing (3 failing)
Pending confirmations (3):
- 71fbe…: Fix: useEffect missing cleanup for setInterval (confidence: 85)
- 92ad1…: Fix: Force-unwrap on possibly-undefined name (confidence: 85)
- …
Resume token: f8b3…
Re-call with autoApply: true (or resumeToken: "f8b3…") to actually patch the files.
Real use cases
> The flows below describe the run_iterative_fix_loop path (v3.0 > headline) — full detect → propose → dry-run → apply-with-backup → > re-run-tests → rollback-on-regression. The diagnose_project autoFix > path in v3.1.1 is the narrower mechanical-replacement-only path; see > [SAFETY.md](SAFETY.md) §4 for what that one actually touches.
1. React Native memory-leak self-healing
A team adds setInterval(...) in a useEffect and forgets cleanup. test-genie's detect_memory_leaks flags it, suggest_fixes proposes return () => clearInterval(id) (src/tools/fixing/suggestFixes.ts:169-179), the loop dry-runs the patch through the TS compiler, applies with backup, re-runs only the affected snapshot test, confirms 100% pass, stops. Before: 1 failing snapshot. After: 0 failing, 1 fix applied, 1 backup at .test-genie-backups/.
2. Flutter widget dispose() automation
AnimationController left undisposed. test-genie sees the missing dispose() override, generates a Dart @override dispose() { controller.dispose(); super.dispose(); } block (suggestFixes.ts:214-217), runs dart analyze on the patched file, applies, re-runs flutter test, converges.
3. iOS retain-cycle (closure capture)
self.timer = Timer.scheduledTimer(...) { _ in self.tick() } — rule-based detector flags closure self-capture, fixer rewrites to [weak self] _ in guard let self = self else { return }; self.tick() (suggestFixes.ts:239-242). If swiftc is on PATH the syntax check is real; otherwise test-genie reports "downgraded validation" so you know.
How the iterate-fix loop works
┌────────────────────┐
│ collect tests │ (run_scenario_test / supplied list)
└─────────┬──────────┘
│
pass-rate ≥ threshold? ── yes ──▶ SUCCESS
│ no
▼
┌────────────────────┐
│ detect issues │ memory + logic analyzers
└─────────┬──────────┘
│
┌────────────────────┐
│ suggest fixes │ rule-based (default) → LLM (hybrid, optional)
└─────────┬──────────┘
│
┌────────────────────┐
│ dry-run + syntax │ TS compiler API / platform compiler / brace check
└─────────┬──────────┘
│
┌────────────────────┐
│ apply with backup │ per-file `.test-genie-backups/`
└─────────┬──────────┘
│
┌────────────────────┐
│ re-run tests │ regression? yes → auto-rollback
└─────────┬──────────┘
│
▼
loop (≤ maxIterations, ≤ totalTimeout)
See [docs/ITERATEFIXLOOP.md](docs/ITERATEFIXLOOP.md) for a sequence diagram and the full safety-guard list.
Tools (23)
| # | Tool | Mode | |---|------|------| | 1 | analyze_app_structure | real | | 2 | generate_scenarios | real | | 3 | create_test_plan | real | | 4 | run_scenario_test | hybrid | | 5 | run_simulation | simulated | | 6 | run_stress_test | hybrid | | 7 | detect_memory_leaks | real | | 8 | detect_logic_errors | real | | 9 | suggest_fixes | real | | 10 | confirm_fix | real | | 11 | apply_fix | real | | 12 | rollback_fix | real | | 13 | run_full_automation | hybrid | | 14 | run_iterative_fix_loop (v3.0 headline) | hybrid | | 15 | generate_report | real | | 16 | get_pending_fixes | real | | 17 | get_test_history | real | | 18 | analyze_performance | real | | 19 | analyze_code_deep | real | | 20 | generate_cicd_config | real | | 21 | diagnose_project (v3.1 headline — vibe-check) | real | | 22 | detect_race_conditions | real | | 23 | detect_security_issues | real |
mode legend in [docs/SIMULATIONVSREAL.md](docs/SIMULATIONVSREAL.md).
Plus 4 resources (test-genie://iteration-logs, …/test-history/{path}, …/iteration-logs/{loopId}, …/applied-fixes/{path}) and 3 prompts (full-test-pipeline, diagnose-failure, vibe-check).
What vibe-check catches
Race conditions (detect_race_conditions / diagnose_project):
| Pattern | Language | Severity | Auto-fixable (v3.1.1) | |---|---|---|---| | useState setter called after await without mount guard | TS/JS/React | high | no (structural) | | useEffect with async fetch, no AbortController/cleanup | TS/JS/React | high | no (structural) | | arr.forEach(async ...) (silent fire-and-forget) | TS/JS | medium | no (ordering-sensitive) | | Adjacent fetches without Promise.all / sequencing | TS/JS | medium | no | | TOCTOU: existsSync then readFileSync without lock | TS/JS Node | medium | no | | Non-atomic counter increment in async context | TS/JS | low | no | | @Published mutation outside @MainActor | Swift | medium | no | | Concurrent DispatchQueue writes without .barrier | Swift | medium | no | | MutableStateFlow mutated off Dispatchers.Main | Kotlin | medium | no | | Flow collected without flowOn | Kotlin | low | no | | Goroutine + shared map without sync.Mutex | Go | high | no |
> v3.1.1 honesty audit: useEffect-no-abort and forEach-await were > previously advertised as auto-fixable. They are not — wrapping with > AbortController or rewriting to Promise.all(arr.map(...)) changes > behavior we can't verify statically. They are now report-only. See > [SAFETY.md](SAFETY.md).
Security (detect_security_issues / diagnose_project):
| Pattern | Severity | CWE | Auto-fixable (v3.1.1) | |---|---|---|---| | Hardcoded AWS / Stripe / GitHub / Google / Slack token | critical / high | CWE-798 | no (rotate) | | Hardcoded JWT secret literal | high | CWE-798 | no | | API token in URL query string | high | CWE-200 | no | | .env file present but not gitignored | high | CWE-538 | no (rotation must follow) | | SQL string concat with req.params / req.body | critical | CWE-89 | no | | innerHTML / dangerouslySetInnerHTML with dynamic value | high | CWE-79 | no | | eval() / new Function() with non-literal | critical | CWE-95 | no | | Math.random() in security-sensitive file, standalone assignment | high | CWE-338 | yes (crypto.randomInt) | | Math.random() mixed into arithmetic | high | CWE-338 | no (semantic) | | createHash('md5'\|'sha1') in security-keyword file | high | CWE-327 | yes ('sha256') | | createHash('md5'\|'sha1') elsewhere | medium | CWE-327 | no (below severity floor) | | child_process.exec with user-input template literal | critical | CWE-78 | no | | fetch(req.query.url) (SSRF) | high | CWE-918 | no | | CORS * origin + Allow-Credentials: true | high | CWE-942 | no | | Cookie set without httpOnly / secure / sameSite | low | CWE-1004 | no | | yaml.load without safe schema | medium | CWE-502 | no |
> v3.1.1 honesty audit: .env/Math.random (general)/yaml.load were > previously advertised as auto-fixable. They were either too risky to > rewrite blindly or no strategy shipped — flipped to report-only. See > [SAFETY.md](SAFETY.md) §5.
What vibe-check misses (honest list)
This is a "catch the obvious stuff in 30s" filter, not Snyk / Semgrep / a full SAST tool. We don't catch:
- Cross-file data-flow. If user input flows through three files before reaching a
db.query, the regex won't connect the dots. A real SAST traces taint across the call graph. Roadmap: ts-morph reference walking for top-N entry points. - Vulnerable transitive deps. We don't query npm advisories — that's
npm audit's job, and bundling a stale advisory list would lie. Runnpm audit --jsonin parallel if you want dep-CVE coverage. - Race conditions across processes. We catch in-process JS / Swift / Kotlin / Go races. Distributed races (lock ordering across services, DB transactions) need different tooling.
- Type-correct but logic-broken code. The analyzer is syntactic, not semantic. A
Math.random()namedgetNoncewon't fool us; a properly-namedcrypto.randomBytesused with a tiny entropy budget will. - Custom secret formats. Internal company tokens with unique prefixes need a regex you can add to
securityAnalyzer.SECRET_PATTERNS. PR welcome. - Real-time / dynamic issues. Memory leaks under load, network timeouts, slow renders mid-interaction — those need
run_stress_test/run_simulation, not static analysis.
If you want deeper coverage on top of vibe-check: feed the findings into run_iterative_fix_loop for test-verified application, or escalate to Snyk / Semgrep / GitHub Advanced Security for compliance use cases.
vibe-check vs alternatives
| | vibe-check (test-genie) | Snyk | Semgrep | GitHub Advanced Security | |------------------------|-------------------------|------|---------|--------------------------| | Runs locally | yes | hybrid (cloud) | yes | no (cloud) | | Telemetry-free | yes (zero network calls) | no | partial | no | | Fix loop integration | yes (run_iterative_fix_loop) | no | no | no | | Race-condition detection | yes (JS/Swift/Kotlin/Go) | no | partial | partial | | Cross-file taint flow | no (roadmap) | yes | yes | yes | | Setup time | none (already installed if test-genie is installed) | account + auth | install + ruleset | repo-level enable |
If your goal is "before I commit, what's broken?", vibe-check wins on latency. If your goal is "compliance + supply chain audit", use the dedicated tools.
When NOT to use test-genie
- Production-gate test runs. test-genie is built for the development feedback loop. For shipping decisions, use a proper CI that you control end-to-end.
- Code your team must hand-review every line of. The loop's job is to propose and apply fixes; if every fix needs a human eye, leave
autoApply: false(the default) and use it as a fix-proposal generator only. - No backup / no version control situations. test-genie's auto-rollback is best-effort and requires the per-file backup to exist. Always run inside a git working tree.
Comparison
| | test-genie | Detox | Maestro | xcodebuild test | |---|---|---|---|---| | Runs E2E / unit tests | ✅ (via Jest/Detox/etc.) | ✅ | ✅ | ✅ | | Detects code issues | ✅ rule + LLM | ❌ | ❌ | ❌ | | Iterative fix loop | ✅ (run_iterative_fix_loop) | ❌ | ❌ | ❌ | | Auto-rollback on test regression | ✅ inside run_iterative_fix_loop only | ❌ | ❌ | ❌ | | Auto-rollback on syntax failure | ✅ all apply paths | ❌ | ❌ | ❌ | | MCP-native (talks to Claude / agents) | ✅ | ❌ | ❌ | ❌ | | Multi-platform | iOS+Android+Web+Flutter+RN | iOS+Android | iOS+Android | iOS only |
> Scope note: diagnose_project autoFix: true rolls back on syntax-validate > failure (applyFix.ts:185-202) but does not re-run tests, so it > cannot detect test regressions. For test-driven rollback use > run_iterative_fix_loop. See [SAFETY.md](SAFETY.md) §2.4.
test-genie uses tools like Jest, Detox, and xcodebuild test under the hood — it sits at the orchestration layer, not the test-runner layer.
Known limitations
- Platform syntax check downgrade. For Swift/Kotlin/Java/Dart we try the platform compiler in
-typecheckmode. If the compiler isn't on PATH, we fall back to brac
…
Source & license
This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.
- Author: MUSE-CODE-SPACE
- Source: MUSE-CODE-SPACE/test-genie-mcp
- License: MIT
Install and usage instructions live in the source repository linked above.
Reviews
No reviews yet — be the first.
Write a review
Versions
- v2.0.2 Imported from the upstream source.