AgentStack
MCP unreviewed Apache-2.0 Self-run

Kloakt

mcp-kultmember6banger-kloakt · by KultMember6Banger

Cloaked headless browser for AI agents — stealth TLS, smart extraction, SPA fallback

No reviews yet
0 installs
0 views
view→install

Install

$ agentstack add mcp-kultmember6banger-kloakt

Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.

Are you the author of Kloakt? Claim this listing to set pricing, connect Stripe payouts, and keep 70% of every sale.

About

Kloakt

Cloaked headless browser for AI agents. Lightweight, stealthy, built in Rust. Based on Obscura.


Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.

Beyond one-shot extraction it can drive persistent, stateful sessions (click, type, navigate — cookies and page/JS state persist across calls), emit an accessibility/structure snapshot as an agent-vision substitute, and capture real screenshots via system Chrome — 12 MCP tools in all.

Why Kloakt?

| Metric | Kloakt | Headless Chrome | |--------------|--------------|------------------| | Memory | 30 MB | 200+ MB | | Binary size | 70 MB | 300+ MB | | Anti-detect | Built-in | None | | Page load | 85 ms | ~500 ms | | Startup | Instant | ~2s | | SPA extract | Yes | Manual |

Install

Prebuilt binary (recommended)

One-line install (Linux & macOS). Downloads the right binary for your OS/arch from the latest GitHub Release and installs it to ~/.local/bin (or /usr/local/bin when run as root):

curl -fsSL https://raw.githubusercontent.com/KultMember6Banger/kloakt/main/install.sh | sh

You can pin a version or override the install dir:

KLOAKT_VERSION=v0.1.2 INSTALL_DIR=/usr/local/bin \
  sh -c "$(curl -fsSL https://raw.githubusercontent.com/KultMember6Banger/kloakt/main/install.sh)"

Windows: download kloakt-x86_64-windows.zip from the Releases page and extract kloakt.exe onto your PATH.

Homebrew (macOS)

brew install KultMember6Banger/kloakt/kloakt
# or, from a local checkout:
brew install --formula ./Formula/kloakt.rb

(Until a dedicated tap exists, brew tap KultMember6Banger/kloakt https://github.com/KultMember6Banger/kloakt then brew install kloakt.)

cargo install

Builds the CLI from crates.io (requires Rust toolchain; first build compiles V8, ~5 min):

cargo install obscura-cli

This installs the kloakt binary. To build with stealth mode, add --features stealth.

Build from source

git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release

# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth

Requires Rust 1.75+ (rustup.rs). First build takes ~5 min (V8 compiles from source, cached after).

Quick Start

Extract content (AI agent use)

# Clean markdown from any page
kloakt extract https://example.com --main

# Structured JSON with metadata
kloakt extract https://example.com --main --json

# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000

# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json

Fetch a page

# Get the page title
kloakt fetch https://example.com --eval "document.title"

# Extract all links
kloakt fetch https://example.com --dump links

# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown

# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0

Start the CDP server

kloakt serve --port 9222

# With stealth mode
kloakt serve --port 9222 --stealth

Scrape in parallel

kloakt scrape url1 url2 url3 ... \
  --concurrency 25 \
  --eval "document.querySelector('h1').textContent" \
  --format json

Snapshot page structure (agent vision)

# Indexed accessibility/structure tree — tags, text, roles, what's clickable, visibility
kloakt snapshot https://example.com

# Only the actionable elements (links, buttons, inputs), with id/name/label for targeting
kloakt snapshot https://example.com --interactive

kloakt has no rasterizer, so this is the lightweight "what's on the page and what can I act on" view for agents that work from structure rather than pixels.

Screenshot (via system Chrome)

# Real PNG — delegates to a locally-installed Chrome/Chromium/Edge
kloakt screenshot https://example.com --output shot.png --width 1280 --height 800

Persistent sessions

Drive a named session whose cookies and page/JS state survive across separate invocations, backed by a running kloakt serve daemon:

kloakt serve --port 9222 &                        # start the daemon once

kloakt session open shop --url https://example.com
kloakt session snapshot shop --interactive        # see the page structure
kloakt session type shop 'input[name=q]' 'hello'  # fill a field
kloakt session click shop 'button[type=submit]'   # click an element
kloakt session text shop                          # read the body text
kloakt session eval shop 'document.title'         # run JS, get the value back
kloakt session close shop                         # tear down (drops cookies + page)

Smart Extraction

The extract command uses a multi-phase pipeline optimized for AI agents:

  1. Noise removal — strips cookie banners, ads, popups, nav, social widgets
  2. Content scoring — text-density algorithm (Readability-like) finds the main content block
  3. Markdown conversion — DOM-to-markdown with absolute URL resolution
  4. SPA fallback — when JS rendering fails, extracts from meta tags, Open Graph, JSON-LD, and noscript content

Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).

Python API

from kloakt import (
    extract, extract_fields, fetch, scrape, search, crawl,
    snapshot, screenshot, session_open, session_close,
)

# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)

# Cap output length
page = extract("https://example.com", max_chars=3000)

# Wait for SPA content
page = extract("https://example.com", delay=2000)

# Structured field extraction via CSS selectors
data = extract_fields("https://news.ycombinator.com", {
    "title": "title",
    "stories": ".titleline > a[]",   # [] => list of all matches
    "links": ".titleline > a[]@href" # @href => an attribute
})
print(data["data"]["stories"])

# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")

# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)

# Discover links, or breadth-first crawl a small section of a site
links = search("https://news.ycombinator.com", same_domain=True)
pages = crawl("https://example.com", max_pages=5, max_depth=1)

# Structure snapshot (agent vision) and a real screenshot via system Chrome
snap = snapshot("https://example.com", interactive=True)
screenshot("https://example.com", output="shot.png")

# Persistent session — auto-starts a daemon if one isn't running; cookies + page
# state persist across calls. (session_nav / _click / _type / _eval / _text / _snapshot)
session_open("shop", url="https://example.com")
# ... drive the page across calls ...
session_close("shop")

MCP Server (Claude Code)

Kloakt includes an MCP server for use as a Claude Code tool:

{
  "mcpServers": {
    "kloakt": {
      "command": "python3",
      "args": ["/path/to/kloakt/mcp_server.py"]
    }
  }
}

Exposes 12 native tools:

| Tool | What it does | |------|--------------| | kloakt_extract | Clean markdown, or structured fields via schema | | kloakt_fetch | Low-level fetch (html/text/links/markdown, or JS eval) | | kloakt_scrape | Many URLs in parallel | | kloakt_search | Discover outbound links on a page | | kloakt_crawl | Budget/depth-limited breadth-first crawl | | kloakt_snapshot | Accessibility/structure tree (agent vision) | | kloakt_screenshot | Real PNG via system Chrome | | kloakt_session_open | Open a persistent named session | | kloakt_session_act | navigate / click / type / eval within a session | | kloakt_session_read | Read a session's page as text or snapshot | | kloakt_session_list | List open sessions | | kloakt_session_close | Close a session (drops its cookies + page) |

Puppeteer / Playwright

Puppeteer

The CDP server embeds a per-session token in the WebSocket path (like Chrome). Connect via browserURL so the client discovers the token from /json/version automatically — don't hardcode the ws://.../devtools/browser path.

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
  Array.from(document.querySelectorAll('.titleline > a'))
    .map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();

Playwright

import { chromium } from 'playwright-core';

const browser = await chromium.connectOverCDP({
  endpointURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();

Stealth Mode

Enable with --features stealth.

  • Per-session fingerprint randomization (GPU, screen, canvas, audio, battery)
  • Realistic navigator.userAgentData (Chrome 145, high-entropy values)
  • event.isTrusted = true for dispatched events
  • Native function masking (Function.prototype.toString()[native code])
  • navigator.webdriver = undefined
  • Realistic Accept-Language + Client Hints (Sec-CH-UA) request headers
  • Per-session randomized navigator.languages
  • TLS fingerprint (JA3) rotation across Chrome 145 Linux / Windows / macOS profiles
  • 3,520 tracker domains blocked

CLI Reference

kloakt extract

| Flag | Default | Description | |------|---------|-------------| | --format | markdown | Output: markdown, text, or links | | --main | off | Strip nav, header, footer, sidebar | | --json | off | Structured JSON: title, URL, content, meta | | --max-chars | unlimited | Truncate content to N characters | | --delay | 0 | Extra ms to wait after load | | --stealth | off | Anti-detection mode | | --selector | — | Wait for CSS selector | | --wait-until | load | load, domcontentloaded, networkidle0 (bounded by --wait) | | --schema | — | Extract structured fields as JSON (see below) | | --har | — | Write captured network activity to a HAR file | | --cache-ttl | 0 | Cache the result on disk and reuse it for N seconds |

Structured extraction with --schema

Pass a JSON object mapping field names to CSS selectors. Suffix a selector with [] to return all matches as a list, and with @attr to return an attribute instead of text:

kloakt extract https://news.ycombinator.com \
  --schema '{"title":"title","stories":".titleline > a[]","first_link":".titleline > a@href"}'
# => { "url": ..., "data": { "title": "...", "stories": [...], "first_link": "..." }, "elapsed_ms": ... }

This is also exposed through the MCP kloakt_extract tool via an optional schema argument.

kloakt fetch

| Flag | Default | Description | |------|---------|-------------| | --dump | html | Output: html, text, links, markdown | | --eval | — | JavaScript expression to evaluate | | --wait-until | load | Wait condition | | --selector | — | Wait for CSS selector | | --stealth | off | Anti-detection mode | | --quiet | off | Suppress banner |

kloakt serve

| Flag | Default | Description | |------|---------|-------------| | --port | 9222 | WebSocket port | | --proxy | — | HTTP/SOCKS5 proxy URL | | --stealth | off | Anti-detection + tracker blocking | | --workers | 1 | Parallel workers |

kloakt scrape

| Flag | Default | Description | |------|---------|-------------| | --concurrency | 10 | Parallel workers | | --eval | — | JS expression per page | | --format | json | Output: json or text |

kloakt snapshot

Emit an indexed accessibility/structure tree (always JSON) — an agent-vision substitute. Each node is compact: i (index), tag, depth, vis (visible), and when present click, role, text, type/value, href, id, name, label.

| Flag | Default | Description | |------|---------|-------------| | --interactive | off | Only actionable elements (links, buttons, inputs) | | --max-nodes | 1500 | Cap on nodes emitted | | --stealth | off | Anti-detection mode | | --delay | 0 | Extra ms to wait after load | | --wait-until | load | Wait condition |

kloakt screenshot

Capture a real PNG by delegating to a locally-installed Chrome/Chromium/Edge (kloakt has no rasterizer). Errors clearly if none is found; override detection with --chrome or the KLOAKT_CHROME env var.

| Flag | Default | Description | |------|---------|-------------| | --output | screenshot.png | Output PNG path | | --width | 1280 | Viewport width | | --height | 800 | Viewport height | | --chrome | auto-detect | Path to a Chrome/Chromium/Edge binary |

kloakt session

Drive a persistent, named session against a running kloakt serve daemon. Cookies and page/JS state survive across separate invocations (client state in ~/.kloakt/sessions/.json).

| Command | Description | |---------|-------------| | open [--url URL] [--port] | Open (or reattach to) a session and create a page | | nav | Navigate the session's page | | eval | Evaluate a JS expression, print the JSON value | | text | Print document.body.innerText | | snapshot [--interactive] | Structure snapshot of the session's page | | click | Click the first matching element | | type | Focus an element, set its value, fire input/change | | list [--port] | List open sessions on the daemon | | close | Close the session (drops its pages + cookies) |

> Multi-statement JS passed to eval returns null (the daemon evaluates a single > expression); wrap it in an IIFE — (function(){ ...; return v })() — to get a value back.

kloakt benchmark

Measure load performance per URL — average/min/max load time, request count, bytes, and DOM node count — as a table or --json.

kloakt benchmark https://example.com https://news.ycombinator.com --runs 3

| Flag | Default | Description | |------|---------|-------------| | --runs | 1 | Runs per URL (reports the average) | | --json | off | Emit JSON instead of a table | | --wait-until | load | load, domcontentloaded, or networkidle0 |

Challenge / bot-wall detection

kloakt extract --json includes a "challenge" field reporting a detected captcha or bot wall (recaptcha, hcaptcha, turnstile, cloudflare, datadome, perimeterx) or null. This is detection only — kloakt tells you a page is gated so an agent can stop and back off; it does not attempt to solve or evade challenges. (Also surfaced via the MCP kloakt_extract output and the Python Page.challenge field.)

Global flags

| Flag | Default | Description | |------|---------|-------------| | --obey-robots | off | Respect robots.txt — refuse to fetch disallowed paths | | --allow-private | off | Allow private/internal/loopback hosts (disables the SSRF guard) |

> Security note: by default kloakt refuses to fetch private, loopback, link-local, and > cloud-metadata addresses (SSRF protection), and rejects file:// URLs. Use --allow-private > only when you intentionally need to reach internal services. The CDP server binds to > 127.0.0.1 and validates the Host header to block DNS-rebinding.

CDP API

Full Chrome DevTool

Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

Install and usage instructions live in the source repository linked above.

Reviews

No reviews yet — be the first.

Versions

  • v0.1.2 Imported from the upstream source.