AgentStack
MCP unreviewed MIT Self-run

PDF Reader

mcp-xvvln-pdf-reader-mcp · by Xvvln

MCP server for extracting text, images, tables, links, annotations, and metadata from PDF files.

No reviews yet
0 installs
1 views
0.0% view→install

Install

$ agentstack add mcp-xvvln-pdf-reader-mcp

Open-source listing — not yet scanned by AgentStack. Follow the source repository for install instructions.

Are you the author of PDF Reader? Claim this listing to set pricing, connect Stripe payouts, and keep 70% of every sale.

About

pdf-reader-mcp

一个用于读取和分析 PDF 文件的 MCP 服务器。它可以为支持 MCP(Model Context Protocol)的客户端提供 PDF 文本、页面图片、表格、链接、批注、目录、元数据和基础文本统计。

A PDF-focused MCP server for extracting text, rendered pages, tables, links, annotations, outlines, metadata, and text statistics from PDF files.

Package name

  • GitHub repository: pdf-reader-mcp
  • MCP Registry name: io.github.Xvvln/pdf-reader-mcp
  • PyPI package: pdf-insight-mcp
  • CLI commands: pdf-reader-mcp and pdf-insight-mcp

pdf-reader-mcp is the project name. The PyPI package is published as pdf-insight-mcp because the pdf-reader-mcp package name is not available on PyPI.

Features

| Tool | What it does | | --- | --- | | get_pdf_info | Read document metadata, page count, file size, and encryption status. | | read_pdf_as_text | Extract text from selected pages with page and character limits. | | read_pdf_as_images | Render selected pages as base64-encoded images. | | get_pdf_outline | Read bookmarks and outline entries. | | search_pdf_text | Search text and return per-match page context. | | extract_pdf_tables | Extract structured tables when PyMuPDF can detect them. | | extract_pdf_images | Extract embedded PDF images. | | get_pdf_page_info | Inspect one page's size, text, images, links, and rotation. | | extract_pdf_links | Extract external URLs and internal page jumps. | | get_pdf_annotations | Read comments, highlights, and annotation metadata. | | get_pdf_text_stats | Compute text, line, paragraph, and scan-likelihood stats. | | compare_pdf_pages | Compare text similarity between two pages. |

Quick start

Install uv if you do not already have it:

curl -LsSf https://astral.sh/uv/install.sh | sh

Run the server directly from PyPI:

uvx pdf-insight-mcp

Or install it first:

python -m pip install pdf-insight-mcp
pdf-reader-mcp

MCP client configuration

Use the published PyPI package:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "args": ["pdf-insight-mcp"]
    }
  }
}

Use a local checkout for development:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/pdf-reader-mcp",
        "run",
        "pdf-reader-mcp"
      ]
    }
  }
}

Replace /absolute/path/to/pdf-reader-mcp with the absolute path to this repository on your machine.

Common usage

Ask your MCP client to call tools with an absolute PDF path. Example requests:

Read /Users/me/Documents/report.pdf as text.
Search /Users/me/Documents/report.pdf for "baseline characteristics".
Render pages 1-3 of /Users/me/Documents/report.pdf as images.
Extract links and annotations from /Users/me/Documents/review.pdf.

For large PDFs, prefer small page ranges first. For scanned or layout-sensitive PDFs, use read_pdf_as_images with a small pages range and moderate dpi.

Limits and behavior

  • read_pdf_as_text defaults to at most 50 pages and 200000 returned characters.
  • read_pdf_as_images rejects requests above 20 pages.
  • read_pdf_as_images defaults to an overall image payload cap of about 20 MB.
  • extract_pdf_images returns at most 20 embedded images but reports the actual detected total.
  • Encrypted PDFs are rejected unless they are already accessible without a password.
  • Scanned PDFs may have little or no extractable text. Use image rendering or OCR outside this server when needed.

Development

Install dependencies:

uv sync --extra dev

Run tests:

uv run pytest -q

Build the package:

uv build
uvx twine check dist/*

Run the local server:

uv run pdf-reader-mcp

Release

Releases are published through GitHub Actions.

Before the first release, configure PyPI Trusted Publishing with:

PyPI project name: pdf-insight-mcp
Owner: Xvvln
Repository name: pdf-reader-mcp
Workflow filename: publish.yml
Environment name: leave empty

Then release by bumping versions in pyproject.toml and server.json, committing the change, and pushing a version tag:

git tag vX.Y.Z
git push origin main --tags

The Publish workflow runs tests, builds the Python package, publishes to PyPI, authenticates to the MCP Registry with GitHub OIDC, and publishes server.json.

Tech stack

  • Python 3.10+
  • MCP Python SDK
  • PyMuPDF
  • uv
  • pytest

License

MIT

Source & license

This open-source MCP server is cataloged on AgentStack and links to its original source — we do not rehost the code.

Install and usage instructions live in the source repository linked above.

Reviews

No reviews yet — be the first.

Versions

  • v0.2.1 Imported from the upstream source.