# Kloakt

> Cloaked headless browser for AI agents — stealth TLS, smart extraction, SPA fallback

- **Type:** MCP server
- **Install:** `agentstack add mcp-kultmember6banger-kloakt`
- **Verified:** Pending review
- **Seller:** [KultMember6Banger](https://agentstack.voostack.com/s/kultmember6banger)
- **Installs:** 0
- **Latest version:** 0.1.2
- **License:** Apache-2.0
- **Upstream author:** [KultMember6Banger](https://github.com/KultMember6Banger)
- **Source:** https://github.com/KultMember6Banger/kloakt

## Install

```sh
agentstack add mcp-kultmember6banger-kloakt
```

Requires the [AgentStack CLI](https://agentstack.voostack.com/docs/cli). Works with Claude Code, Cursor, and any MCP-compatible agent.

## About

Kloakt

  Cloaked headless browser for AI agents.
  Lightweight, stealthy, built in Rust. Based on Obscura.

---

Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.

Beyond one-shot extraction it can drive **persistent, stateful sessions** (click, type, navigate — cookies *and* page/JS state persist across calls), emit an **accessibility/structure snapshot** as an agent-vision substitute, and capture **real screenshots** via system Chrome — 12 MCP tools in all.

### Why Kloakt?

| Metric       | Kloakt       | Headless Chrome |
|--------------|--------------|------------------|
| Memory       | **30 MB**    | 200+ MB          |
| Binary size  | **70 MB**    | 300+ MB          |
| Anti-detect  | **Built-in** | None             |
| Page load    | **85 ms**    | ~500 ms          |
| Startup      | **Instant**  | ~2s              |
| SPA extract  | **Yes**      | Manual           |

## Install

### Prebuilt binary (recommended)

One-line install (Linux & macOS). Downloads the right binary for your OS/arch from the latest GitHub Release and installs it to `~/.local/bin` (or `/usr/local/bin` when run as root):

```bash
curl -fsSL https://raw.githubusercontent.com/KultMember6Banger/kloakt/main/install.sh | sh
```

You can pin a version or override the install dir:

```bash
KLOAKT_VERSION=v0.1.2 INSTALL_DIR=/usr/local/bin \
  sh -c "$(curl -fsSL https://raw.githubusercontent.com/KultMember6Banger/kloakt/main/install.sh)"
```

Windows: download `kloakt-x86_64-windows.zip` from the [Releases page](https://github.com/KultMember6Banger/kloakt/releases) and extract `kloakt.exe` onto your `PATH`.

### Homebrew (macOS)

```bash
brew install KultMember6Banger/kloakt/kloakt
# or, from a local checkout:
brew install --formula ./Formula/kloakt.rb
```

(Until a dedicated tap exists, `brew tap KultMember6Banger/kloakt https://github.com/KultMember6Banger/kloakt` then `brew install kloakt`.)

### cargo install

Builds the CLI from crates.io (requires Rust toolchain; first build compiles V8, ~5 min):

```bash
cargo install obscura-cli
```

This installs the `kloakt` binary. To build with stealth mode, add `--features stealth`.

### Build from source

```bash
git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release

# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth
```

Requires Rust 1.75+ ([rustup.rs](https://rustup.rs)). First build takes ~5 min (V8 compiles from source, cached after).

## Quick Start

### Extract content (AI agent use)

```bash
# Clean markdown from any page
kloakt extract https://example.com --main

# Structured JSON with metadata
kloakt extract https://example.com --main --json

# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000

# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json
```

### Fetch a page

```bash
# Get the page title
kloakt fetch https://example.com --eval "document.title"

# Extract all links
kloakt fetch https://example.com --dump links

# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown

# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0
```

### Start the CDP server

```bash
kloakt serve --port 9222

# With stealth mode
kloakt serve --port 9222 --stealth
```

### Scrape in parallel

```bash
kloakt scrape url1 url2 url3 ... \
  --concurrency 25 \
  --eval "document.querySelector('h1').textContent" \
  --format json
```

### Snapshot page structure (agent vision)

```bash
# Indexed accessibility/structure tree — tags, text, roles, what's clickable, visibility
kloakt snapshot https://example.com

# Only the actionable elements (links, buttons, inputs), with id/name/label for targeting
kloakt snapshot https://example.com --interactive
```

kloakt has no rasterizer, so this is the lightweight "what's on the page and what can I act on" view for agents that work from structure rather than pixels.

### Screenshot (via system Chrome)

```bash
# Real PNG — delegates to a locally-installed Chrome/Chromium/Edge
kloakt screenshot https://example.com --output shot.png --width 1280 --height 800
```

### Persistent sessions

Drive a named session whose cookies **and** page/JS state survive across separate
invocations, backed by a running `kloakt serve` daemon:

```bash
kloakt serve --port 9222 &                        # start the daemon once

kloakt session open shop --url https://example.com
kloakt session snapshot shop --interactive        # see the page structure
kloakt session type shop 'input[name=q]' 'hello'  # fill a field
kloakt session click shop 'button[type=submit]'   # click an element
kloakt session text shop                          # read the body text
kloakt session eval shop 'document.title'         # run JS, get the value back
kloakt session close shop                         # tear down (drops cookies + page)
```

## Smart Extraction

The `extract` command uses a multi-phase pipeline optimized for AI agents:

1. **Noise removal** — strips cookie banners, ads, popups, nav, social widgets
2. **Content scoring** — text-density algorithm (Readability-like) finds the main content block
3. **Markdown conversion** — DOM-to-markdown with absolute URL resolution
4. **SPA fallback** — when JS rendering fails, extracts from meta tags, Open Graph, JSON-LD, and noscript content

Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).

## Python API

```python
from kloakt import (
    extract, extract_fields, fetch, scrape, search, crawl,
    snapshot, screenshot, session_open, session_close,
)

# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)

# Cap output length
page = extract("https://example.com", max_chars=3000)

# Wait for SPA content
page = extract("https://example.com", delay=2000)

# Structured field extraction via CSS selectors
data = extract_fields("https://news.ycombinator.com", {
    "title": "title",
    "stories": ".titleline > a[]",   # [] => list of all matches
    "links": ".titleline > a[]@href" # @href => an attribute
})
print(data["data"]["stories"])

# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")

# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)

# Discover links, or breadth-first crawl a small section of a site
links = search("https://news.ycombinator.com", same_domain=True)
pages = crawl("https://example.com", max_pages=5, max_depth=1)

# Structure snapshot (agent vision) and a real screenshot via system Chrome
snap = snapshot("https://example.com", interactive=True)
screenshot("https://example.com", output="shot.png")

# Persistent session — auto-starts a daemon if one isn't running; cookies + page
# state persist across calls. (session_nav / _click / _type / _eval / _text / _snapshot)
session_open("shop", url="https://example.com")
# ... drive the page across calls ...
session_close("shop")
```

## MCP Server (Claude Code)

Kloakt includes an MCP server for use as a Claude Code tool:

```json
{
  "mcpServers": {
    "kloakt": {
      "command": "python3",
      "args": ["/path/to/kloakt/mcp_server.py"]
    }
  }
}
```

Exposes 12 native tools:

| Tool | What it does |
|------|--------------|
| `kloakt_extract` | Clean markdown, or structured fields via `schema` |
| `kloakt_fetch` | Low-level fetch (html/text/links/markdown, or JS eval) |
| `kloakt_scrape` | Many URLs in parallel |
| `kloakt_search` | Discover outbound links on a page |
| `kloakt_crawl` | Budget/depth-limited breadth-first crawl |
| `kloakt_snapshot` | Accessibility/structure tree (agent vision) |
| `kloakt_screenshot` | Real PNG via system Chrome |
| `kloakt_session_open` | Open a persistent named session |
| `kloakt_session_act` | navigate / click / type / eval within a session |
| `kloakt_session_read` | Read a session's page as text or snapshot |
| `kloakt_session_list` | List open sessions |
| `kloakt_session_close` | Close a session (drops its cookies + page) |

## Puppeteer / Playwright

### Puppeteer

The CDP server embeds a per-session token in the WebSocket path (like Chrome). Connect via
`browserURL` so the client discovers the token from `/json/version` automatically — don't
hardcode the `ws://.../devtools/browser` path.

```javascript
import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
  Array.from(document.querySelectorAll('.titleline > a'))
    .map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();
```

### Playwright

```javascript
import { chromium } from 'playwright-core';

const browser = await chromium.connectOverCDP({
  endpointURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();
```

## Stealth Mode

Enable with `--features stealth`.

- Per-session fingerprint randomization (GPU, screen, canvas, audio, battery)
- Realistic `navigator.userAgentData` (Chrome 145, high-entropy values)
- `event.isTrusted = true` for dispatched events
- Native function masking (`Function.prototype.toString()` → `[native code]`)
- `navigator.webdriver = undefined`
- Realistic `Accept-Language` + Client Hints (`Sec-CH-UA`) request headers
- Per-session randomized `navigator.languages`
- TLS fingerprint (JA3) rotation across Chrome 145 Linux / Windows / macOS profiles
- 3,520 tracker domains blocked

## CLI Reference

### `kloakt extract `

| Flag | Default | Description |
|------|---------|-------------|
| `--format` | `markdown` | Output: `markdown`, `text`, or `links` |
| `--main` | off | Strip nav, header, footer, sidebar |
| `--json` | off | Structured JSON: title, URL, content, meta |
| `--max-chars` | unlimited | Truncate content to N characters |
| `--delay` | `0` | Extra ms to wait after load |
| `--stealth` | off | Anti-detection mode |
| `--selector` | — | Wait for CSS selector |
| `--wait-until` | `load` | `load`, `domcontentloaded`, `networkidle0` (bounded by `--wait`) |
| `--schema` | — | Extract structured fields as JSON (see below) |
| `--har` | — | Write captured network activity to a HAR file |
| `--cache-ttl` | `0` | Cache the result on disk and reuse it for N seconds |

#### Structured extraction with `--schema`

Pass a JSON object mapping field names to CSS selectors. Suffix a selector with `[]` to
return **all** matches as a list, and with `@attr` to return an **attribute** instead of text:

```bash
kloakt extract https://news.ycombinator.com \
  --schema '{"title":"title","stories":".titleline > a[]","first_link":".titleline > a@href"}'
# => { "url": ..., "data": { "title": "...", "stories": [...], "first_link": "..." }, "elapsed_ms": ... }
```

This is also exposed through the MCP `kloakt_extract` tool via an optional `schema` argument.

### `kloakt fetch `

| Flag | Default | Description |
|------|---------|-------------|
| `--dump` | `html` | Output: `html`, `text`, `links`, `markdown` |
| `--eval` | — | JavaScript expression to evaluate |
| `--wait-until` | `load` | Wait condition |
| `--selector` | — | Wait for CSS selector |
| `--stealth` | off | Anti-detection mode |
| `--quiet` | off | Suppress banner |

### `kloakt serve`

| Flag | Default | Description |
|------|---------|-------------|
| `--port` | `9222` | WebSocket port |
| `--proxy` | — | HTTP/SOCKS5 proxy URL |
| `--stealth` | off | Anti-detection + tracker blocking |
| `--workers` | `1` | Parallel workers |

### `kloakt scrape `

| Flag | Default | Description |
|------|---------|-------------|
| `--concurrency` | `10` | Parallel workers |
| `--eval` | — | JS expression per page |
| `--format` | `json` | Output: `json` or `text` |

### `kloakt snapshot `

Emit an indexed accessibility/structure tree (always JSON) — an agent-vision substitute.
Each node is compact: `i` (index), `tag`, `depth`, `vis` (visible), and when present `click`,
`role`, `text`, `type`/`value`, `href`, `id`, `name`, `label`.

| Flag | Default | Description |
|------|---------|-------------|
| `--interactive` | off | Only actionable elements (links, buttons, inputs) |
| `--max-nodes` | `1500` | Cap on nodes emitted |
| `--stealth` | off | Anti-detection mode |
| `--delay` | `0` | Extra ms to wait after load |
| `--wait-until` | `load` | Wait condition |

### `kloakt screenshot `

Capture a real PNG by delegating to a locally-installed Chrome/Chromium/Edge (kloakt has no
rasterizer). Errors clearly if none is found; override detection with `--chrome ` or the
`KLOAKT_CHROME` env var.

| Flag | Default | Description |
|------|---------|-------------|
| `--output` | `screenshot.png` | Output PNG path |
| `--width` | `1280` | Viewport width |
| `--height` | `800` | Viewport height |
| `--chrome` | auto-detect | Path to a Chrome/Chromium/Edge binary |

### `kloakt session `

Drive a persistent, named session against a running `kloakt serve` daemon. Cookies **and**
page/JS state survive across separate invocations (client state in `~/.kloakt/sessions/.json`).

| Command | Description |
|---------|-------------|
| `open  [--url URL] [--port]` | Open (or reattach to) a session and create a page |
| `nav  ` | Navigate the session's page |
| `eval  ` | Evaluate a JS expression, print the JSON value |
| `text ` | Print `document.body.innerText` |
| `snapshot  [--interactive]` | Structure snapshot of the session's page |
| `click  ` | Click the first matching element |
| `type   ` | Focus an element, set its value, fire input/change |
| `list [--port]` | List open sessions on the daemon |
| `close ` | Close the session (drops its pages + cookies) |

> Multi-statement JS passed to `eval` returns `null` (the daemon evaluates a single
> expression); wrap it in an IIFE — `(function(){ ...; return v })()` — to get a value back.

### `kloakt benchmark `

Measure load performance per URL — average/min/max load time, request count, bytes, and DOM
node count — as a table or `--json`.

```bash
kloakt benchmark https://example.com https://news.ycombinator.com --runs 3
```

| Flag | Default | Description |
|------|---------|-------------|
| `--runs` | `1` | Runs per URL (reports the average) |
| `--json` | off | Emit JSON instead of a table |
| `--wait-until` | `load` | `load`, `domcontentloaded`, or `networkidle0` |

### Challenge / bot-wall detection

`kloakt extract --json` includes a `"challenge"` field reporting a detected captcha or bot
wall (`recaptcha`, `hcaptcha`, `turnstile`, `cloudflare`, `datadome`, `perimeterx`) or
`null`. This is **detection only** — kloakt tells you a page is gated so an agent can stop
and back off; it does not attempt to solve or evade challenges. (Also surfaced via the MCP
`kloakt_extract` output and the Python `Page.challenge` field.)

### Global flags

| Flag | Default | Description |
|------|---------|-------------|
| `--obey-robots` | off | Respect `robots.txt` — refuse to fetch disallowed paths |
| `--allow-private` | off | Allow private/internal/loopback hosts (disables the SSRF guard) |

> **Security note:** by default kloakt refuses to fetch private, loopback, link-local, and
> cloud-metadata addresses (SSRF protection), and rejects `file://` URLs. Use `--allow-private`
> only when you intentionally need to reach internal services. The CDP server binds to
> `127.0.0.1` and validates the `Host` header to block DNS-rebinding.

## CDP API

Full Chrome DevTool

…

## Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

- **Author:** [KultMember6Banger](https://github.com/KultMember6Banger)
- **Source:** [KultMember6Banger/kloakt](https://github.com/KultMember6Banger/kloakt)
- **License:** Apache-2.0

Install and usage instructions live in the source repository linked above.

## Pricing

- **Free** — Free

## Versions

- **0.1.2** — security scan: pending review — Imported from the upstream source.

## Links

- Listing page: https://agentstack.voostack.com/l/mcp-kultmember6banger-kloakt
- Seller: https://agentstack.voostack.com/s/kultmember6banger
- Browse the marketplace: https://agentstack.voostack.com/browse

---
Listed on AgentStack — the marketplace for AI agent skills and MCP servers. Every listing is security-reviewed. Creators keep 70%.
