# PDF Reader

> MCP server for extracting text, images, tables, links, annotations, and metadata from PDF files.

- **Type:** MCP server
- **Install:** `agentstack add mcp-xvvln-pdf-reader-mcp`
- **Verified:** Pending review
- **Seller:** [Xvvln](https://agentstack.voostack.com/s/xvvln)
- **Installs:** 0
- **Latest version:** 0.2.1
- **License:** MIT
- **Upstream author:** [Xvvln](https://github.com/Xvvln)
- **Source:** https://github.com/Xvvln/pdf-reader-mcp

## Install

```sh
agentstack add mcp-xvvln-pdf-reader-mcp
```

Requires the [AgentStack CLI](https://agentstack.voostack.com/docs/cli). Works with Claude Code, Cursor, and any MCP-compatible agent.

## About

# pdf-reader-mcp

一个用于读取和分析 PDF 文件的 MCP 服务器。它可以为支持 MCP（Model Context Protocol）的客户端提供 PDF 文本、页面图片、表格、链接、批注、目录、元数据和基础文本统计。

A PDF-focused MCP server for extracting text, rendered pages, tables, links, annotations, outlines, metadata, and text statistics from PDF files.

## Package name

- GitHub repository: `pdf-reader-mcp`
- MCP Registry name: `io.github.Xvvln/pdf-reader-mcp`
- PyPI package: `pdf-insight-mcp`
- CLI commands: `pdf-reader-mcp` and `pdf-insight-mcp`

`pdf-reader-mcp` is the project name. The PyPI package is published as `pdf-insight-mcp` because the `pdf-reader-mcp` package name is not available on PyPI.

## Features

| Tool | What it does |
| --- | --- |
| `get_pdf_info` | Read document metadata, page count, file size, and encryption status. |
| `read_pdf_as_text` | Extract text from selected pages with page and character limits. |
| `read_pdf_as_images` | Render selected pages as base64-encoded images. |
| `get_pdf_outline` | Read bookmarks and outline entries. |
| `search_pdf_text` | Search text and return per-match page context. |
| `extract_pdf_tables` | Extract structured tables when PyMuPDF can detect them. |
| `extract_pdf_images` | Extract embedded PDF images. |
| `get_pdf_page_info` | Inspect one page's size, text, images, links, and rotation. |
| `extract_pdf_links` | Extract external URLs and internal page jumps. |
| `get_pdf_annotations` | Read comments, highlights, and annotation metadata. |
| `get_pdf_text_stats` | Compute text, line, paragraph, and scan-likelihood stats. |
| `compare_pdf_pages` | Compare text similarity between two pages. |

## Quick start

Install `uv` if you do not already have it:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Run the server directly from PyPI:

```bash
uvx pdf-insight-mcp
```

Or install it first:

```bash
python -m pip install pdf-insight-mcp
pdf-reader-mcp
```

## MCP client configuration

Use the published PyPI package:

```json
{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "args": ["pdf-insight-mcp"]
    }
  }
}
```

Use a local checkout for development:

```json
{
  "mcpServers": {
    "pdf-reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/pdf-reader-mcp",
        "run",
        "pdf-reader-mcp"
      ]
    }
  }
}
```

Replace `/absolute/path/to/pdf-reader-mcp` with the absolute path to this repository on your machine.

## Common usage

Ask your MCP client to call tools with an absolute PDF path. Example requests:

```text
Read /Users/me/Documents/report.pdf as text.
Search /Users/me/Documents/report.pdf for "baseline characteristics".
Render pages 1-3 of /Users/me/Documents/report.pdf as images.
Extract links and annotations from /Users/me/Documents/review.pdf.
```

For large PDFs, prefer small page ranges first. For scanned or layout-sensitive PDFs, use `read_pdf_as_images` with a small `pages` range and moderate `dpi`.

## Limits and behavior

- `read_pdf_as_text` defaults to at most 50 pages and 200000 returned characters.
- `read_pdf_as_images` rejects requests above 20 pages.
- `read_pdf_as_images` defaults to an overall image payload cap of about 20 MB.
- `extract_pdf_images` returns at most 20 embedded images but reports the actual detected total.
- Encrypted PDFs are rejected unless they are already accessible without a password.
- Scanned PDFs may have little or no extractable text. Use image rendering or OCR outside this server when needed.

## Development

Install dependencies:

```bash
uv sync --extra dev
```

Run tests:

```bash
uv run pytest -q
```

Build the package:

```bash
uv build
uvx twine check dist/*
```

Run the local server:

```bash
uv run pdf-reader-mcp
```

## Release

Releases are published through GitHub Actions.

Before the first release, configure PyPI Trusted Publishing with:

```text
PyPI project name: pdf-insight-mcp
Owner: Xvvln
Repository name: pdf-reader-mcp
Workflow filename: publish.yml
Environment name: leave empty
```

Then release by bumping versions in `pyproject.toml` and `server.json`, committing the change, and pushing a version tag:

```bash
git tag vX.Y.Z
git push origin main --tags
```

The `Publish` workflow runs tests, builds the Python package, publishes to PyPI, authenticates to the MCP Registry with GitHub OIDC, and publishes `server.json`.

## Tech stack

- Python 3.10+
- MCP Python SDK
- PyMuPDF
- uv
- pytest

## License

MIT

## Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

- **Author:** [Xvvln](https://github.com/Xvvln)
- **Source:** [Xvvln/pdf-reader-mcp](https://github.com/Xvvln/pdf-reader-mcp)
- **License:** MIT

Install and usage instructions live in the source repository linked above.

## Pricing

- **Free** — Free

## Versions

- **0.2.1** — security scan: pending review — Imported from the upstream source.

## Links

- Listing page: https://agentstack.voostack.com/l/mcp-xvvln-pdf-reader-mcp
- Seller: https://agentstack.voostack.com/s/xvvln
- Browse the marketplace: https://agentstack.voostack.com/browse

---
Listed on AgentStack — the marketplace for AI agent skills and MCP servers. Every listing is security-reviewed. Creators keep 70%.
