Install
$ agentstack add mcp-kultmember6banger-kloakt Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.
About
Kloakt
Cloaked headless browser for AI agents. Lightweight, stealthy, built in Rust. Based on Obscura.
Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.
Beyond one-shot extraction it can drive persistent, stateful sessions (click, type, navigate — cookies and page/JS state persist across calls), emit an accessibility/structure snapshot as an agent-vision substitute, and capture real screenshots via system Chrome — 12 MCP tools in all.
Why Kloakt?
| Metric | Kloakt | Headless Chrome | |--------------|--------------|------------------| | Memory | 30 MB | 200+ MB | | Binary size | 70 MB | 300+ MB | | Anti-detect | Built-in | None | | Page load | 85 ms | ~500 ms | | Startup | Instant | ~2s | | SPA extract | Yes | Manual |
Install
Prebuilt binary (recommended)
One-line install (Linux & macOS). Downloads the right binary for your OS/arch from the latest GitHub Release and installs it to ~/.local/bin (or /usr/local/bin when run as root):
curl -fsSL https://raw.githubusercontent.com/KultMember6Banger/kloakt/main/install.sh | sh
You can pin a version or override the install dir:
KLOAKT_VERSION=v0.1.2 INSTALL_DIR=/usr/local/bin \
sh -c "$(curl -fsSL https://raw.githubusercontent.com/KultMember6Banger/kloakt/main/install.sh)"
Windows: download kloakt-x86_64-windows.zip from the Releases page and extract kloakt.exe onto your PATH.
Homebrew (macOS)
brew install KultMember6Banger/kloakt/kloakt
# or, from a local checkout:
brew install --formula ./Formula/kloakt.rb
(Until a dedicated tap exists, brew tap KultMember6Banger/kloakt https://github.com/KultMember6Banger/kloakt then brew install kloakt.)
cargo install
Builds the CLI from crates.io (requires Rust toolchain; first build compiles V8, ~5 min):
cargo install obscura-cli
This installs the kloakt binary. To build with stealth mode, add --features stealth.
Build from source
git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release
# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth
Requires Rust 1.75+ (rustup.rs). First build takes ~5 min (V8 compiles from source, cached after).
Quick Start
Extract content (AI agent use)
# Clean markdown from any page
kloakt extract https://example.com --main
# Structured JSON with metadata
kloakt extract https://example.com --main --json
# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000
# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json
Fetch a page
# Get the page title
kloakt fetch https://example.com --eval "document.title"
# Extract all links
kloakt fetch https://example.com --dump links
# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown
# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0
Start the CDP server
kloakt serve --port 9222
# With stealth mode
kloakt serve --port 9222 --stealth
Scrape in parallel
kloakt scrape url1 url2 url3 ... \
--concurrency 25 \
--eval "document.querySelector('h1').textContent" \
--format json
Snapshot page structure (agent vision)
# Indexed accessibility/structure tree — tags, text, roles, what's clickable, visibility
kloakt snapshot https://example.com
# Only the actionable elements (links, buttons, inputs), with id/name/label for targeting
kloakt snapshot https://example.com --interactive
kloakt has no rasterizer, so this is the lightweight "what's on the page and what can I act on" view for agents that work from structure rather than pixels.
Screenshot (via system Chrome)
# Real PNG — delegates to a locally-installed Chrome/Chromium/Edge
kloakt screenshot https://example.com --output shot.png --width 1280 --height 800
Persistent sessions
Drive a named session whose cookies and page/JS state survive across separate invocations, backed by a running kloakt serve daemon:
kloakt serve --port 9222 & # start the daemon once
kloakt session open shop --url https://example.com
kloakt session snapshot shop --interactive # see the page structure
kloakt session type shop 'input[name=q]' 'hello' # fill a field
kloakt session click shop 'button[type=submit]' # click an element
kloakt session text shop # read the body text
kloakt session eval shop 'document.title' # run JS, get the value back
kloakt session close shop # tear down (drops cookies + page)
Smart Extraction
The extract command uses a multi-phase pipeline optimized for AI agents:
- Noise removal — strips cookie banners, ads, popups, nav, social widgets
- Content scoring — text-density algorithm (Readability-like) finds the main content block
- Markdown conversion — DOM-to-markdown with absolute URL resolution
- SPA fallback — when JS rendering fails, extracts from meta tags, Open Graph, JSON-LD, and noscript content
Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).
Python API
from kloakt import (
extract, extract_fields, fetch, scrape, search, crawl,
snapshot, screenshot, session_open, session_close,
)
# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)
# Cap output length
page = extract("https://example.com", max_chars=3000)
# Wait for SPA content
page = extract("https://example.com", delay=2000)
# Structured field extraction via CSS selectors
data = extract_fields("https://news.ycombinator.com", {
"title": "title",
"stories": ".titleline > a[]", # [] => list of all matches
"links": ".titleline > a[]@href" # @href => an attribute
})
print(data["data"]["stories"])
# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")
# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)
# Discover links, or breadth-first crawl a small section of a site
links = search("https://news.ycombinator.com", same_domain=True)
pages = crawl("https://example.com", max_pages=5, max_depth=1)
# Structure snapshot (agent vision) and a real screenshot via system Chrome
snap = snapshot("https://example.com", interactive=True)
screenshot("https://example.com", output="shot.png")
# Persistent session — auto-starts a daemon if one isn't running; cookies + page
# state persist across calls. (session_nav / _click / _type / _eval / _text / _snapshot)
session_open("shop", url="https://example.com")
# ... drive the page across calls ...
session_close("shop")
MCP Server (Claude Code)
Kloakt includes an MCP server for use as a Claude Code tool:
{
"mcpServers": {
"kloakt": {
"command": "python3",
"args": ["/path/to/kloakt/mcp_server.py"]
}
}
}
Exposes 12 native tools:
| Tool | What it does | |------|--------------| | kloakt_extract | Clean markdown, or structured fields via schema | | kloakt_fetch | Low-level fetch (html/text/links/markdown, or JS eval) | | kloakt_scrape | Many URLs in parallel | | kloakt_search | Discover outbound links on a page | | kloakt_crawl | Budget/depth-limited breadth-first crawl | | kloakt_snapshot | Accessibility/structure tree (agent vision) | | kloakt_screenshot | Real PNG via system Chrome | | kloakt_session_open | Open a persistent named session | | kloakt_session_act | navigate / click / type / eval within a session | | kloakt_session_read | Read a session's page as text or snapshot | | kloakt_session_list | List open sessions | | kloakt_session_close | Close a session (drops its cookies + page) |
Puppeteer / Playwright
Puppeteer
The CDP server embeds a per-session token in the WebSocket path (like Chrome). Connect via browserURL so the client discovers the token from /json/version automatically — don't hardcode the ws://.../devtools/browser path.
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
Array.from(document.querySelectorAll('.titleline > a'))
.map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();
Playwright
import { chromium } from 'playwright-core';
const browser = await chromium.connectOverCDP({
endpointURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});
const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();
Stealth Mode
Enable with --features stealth.
- Per-session fingerprint randomization (GPU, screen, canvas, audio, battery)
- Realistic
navigator.userAgentData(Chrome 145, high-entropy values) event.isTrusted = truefor dispatched events- Native function masking (
Function.prototype.toString()→[native code]) navigator.webdriver = undefined- Realistic
Accept-Language+ Client Hints (Sec-CH-UA) request headers - Per-session randomized
navigator.languages - TLS fingerprint (JA3) rotation across Chrome 145 Linux / Windows / macOS profiles
- 3,520 tracker domains blocked
CLI Reference
kloakt extract
| Flag | Default | Description | |------|---------|-------------| | --format | markdown | Output: markdown, text, or links | | --main | off | Strip nav, header, footer, sidebar | | --json | off | Structured JSON: title, URL, content, meta | | --max-chars | unlimited | Truncate content to N characters | | --delay | 0 | Extra ms to wait after load | | --stealth | off | Anti-detection mode | | --selector | — | Wait for CSS selector | | --wait-until | load | load, domcontentloaded, networkidle0 (bounded by --wait) | | --schema | — | Extract structured fields as JSON (see below) | | --har | — | Write captured network activity to a HAR file | | --cache-ttl | 0 | Cache the result on disk and reuse it for N seconds |
Structured extraction with --schema
Pass a JSON object mapping field names to CSS selectors. Suffix a selector with [] to return all matches as a list, and with @attr to return an attribute instead of text:
kloakt extract https://news.ycombinator.com \
--schema '{"title":"title","stories":".titleline > a[]","first_link":".titleline > a@href"}'
# => { "url": ..., "data": { "title": "...", "stories": [...], "first_link": "..." }, "elapsed_ms": ... }
This is also exposed through the MCP kloakt_extract tool via an optional schema argument.
kloakt fetch
| Flag | Default | Description | |------|---------|-------------| | --dump | html | Output: html, text, links, markdown | | --eval | — | JavaScript expression to evaluate | | --wait-until | load | Wait condition | | --selector | — | Wait for CSS selector | | --stealth | off | Anti-detection mode | | --quiet | off | Suppress banner |
kloakt serve
| Flag | Default | Description | |------|---------|-------------| | --port | 9222 | WebSocket port | | --proxy | — | HTTP/SOCKS5 proxy URL | | --stealth | off | Anti-detection + tracker blocking | | --workers | 1 | Parallel workers |
kloakt scrape
| Flag | Default | Description | |------|---------|-------------| | --concurrency | 10 | Parallel workers | | --eval | — | JS expression per page | | --format | json | Output: json or text |
kloakt snapshot
Emit an indexed accessibility/structure tree (always JSON) — an agent-vision substitute. Each node is compact: i (index), tag, depth, vis (visible), and when present click, role, text, type/value, href, id, name, label.
| Flag | Default | Description | |------|---------|-------------| | --interactive | off | Only actionable elements (links, buttons, inputs) | | --max-nodes | 1500 | Cap on nodes emitted | | --stealth | off | Anti-detection mode | | --delay | 0 | Extra ms to wait after load | | --wait-until | load | Wait condition |
kloakt screenshot
Capture a real PNG by delegating to a locally-installed Chrome/Chromium/Edge (kloakt has no rasterizer). Errors clearly if none is found; override detection with --chrome or the KLOAKT_CHROME env var.
| Flag | Default | Description | |------|---------|-------------| | --output | screenshot.png | Output PNG path | | --width | 1280 | Viewport width | | --height | 800 | Viewport height | | --chrome | auto-detect | Path to a Chrome/Chromium/Edge binary |
kloakt session
Drive a persistent, named session against a running kloakt serve daemon. Cookies and page/JS state survive across separate invocations (client state in ~/.kloakt/sessions/.json).
| Command | Description | |---------|-------------| | open [--url URL] [--port] | Open (or reattach to) a session and create a page | | nav | Navigate the session's page | | eval | Evaluate a JS expression, print the JSON value | | text | Print document.body.innerText | | snapshot [--interactive] | Structure snapshot of the session's page | | click | Click the first matching element | | type | Focus an element, set its value, fire input/change | | list [--port] | List open sessions on the daemon | | close | Close the session (drops its pages + cookies) |
> Multi-statement JS passed to eval returns null (the daemon evaluates a single > expression); wrap it in an IIFE — (function(){ ...; return v })() — to get a value back.
kloakt benchmark
Measure load performance per URL — average/min/max load time, request count, bytes, and DOM node count — as a table or --json.
kloakt benchmark https://example.com https://news.ycombinator.com --runs 3
| Flag | Default | Description | |------|---------|-------------| | --runs | 1 | Runs per URL (reports the average) | | --json | off | Emit JSON instead of a table | | --wait-until | load | load, domcontentloaded, or networkidle0 |
Challenge / bot-wall detection
kloakt extract --json includes a "challenge" field reporting a detected captcha or bot wall (recaptcha, hcaptcha, turnstile, cloudflare, datadome, perimeterx) or null. This is detection only — kloakt tells you a page is gated so an agent can stop and back off; it does not attempt to solve or evade challenges. (Also surfaced via the MCP kloakt_extract output and the Python Page.challenge field.)
Global flags
| Flag | Default | Description | |------|---------|-------------| | --obey-robots | off | Respect robots.txt — refuse to fetch disallowed paths | | --allow-private | off | Allow private/internal/loopback hosts (disables the SSRF guard) |
> Security note: by default kloakt refuses to fetch private, loopback, link-local, and > cloud-metadata addresses (SSRF protection), and rejects file:// URLs. Use --allow-private > only when you intentionally need to reach internal services. The CDP server binds to > 127.0.0.1 and validates the Host header to block DNS-rebinding.
CDP API
Full Chrome DevTool
…
Source & license
This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.
- Author: KultMember6Banger
- Source: KultMember6Banger/kloakt
- License: Apache-2.0
Install and usage instructions live in the source repository linked above.
Reviews
No reviews yet — be the first.
Write a review
Versions
- v0.1.2 Imported from the upstream source.