
Security News
The Hidden Blast Radius of the Axios Compromise
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.
@bachstudio/pdf-reader-mcp
Advanced tools
Production-ready PDF processing server for AI agents
5-10x faster parallel processing • Y-coordinate content ordering • 94%+ test coverage • 103 tests passing
PDF Reader MCP is a production-ready Model Context Protocol server that empowers AI agents with enterprise-grade PDF processing capabilities. Extract text, images, and metadata with unmatched performance and reliability.
The Problem:
// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation
The Solution:
// PDF Reader MCP
- 5-10x faster parallel processing ⚡
- Y-coordinate based ordering 📐
- Flexible path support (absolute/relative) 🎯
- Per-page error resilience 🛡️
- 94%+ test coverage ✅
Result: Production-ready PDF processing that scales.
Real-world performance from production testing:
| Operation | Ops/sec | Performance | Use Case |
|---|---|---|---|
| Error handling | 12,933 | ⚡⚡⚡⚡⚡ | Validation & safety |
| Extract full text | 5,575 | ⚡⚡⚡⚡ | Document analysis |
| Extract page | 5,329 | ⚡⚡⚡⚡ | Single page ops |
| Multiple pages | 5,242 | ⚡⚡⚡⚡ | Batch processing |
| Metadata only | 4,912 | ⚡⚡⚡ | Quick inspection |
| Document | Sequential | Parallel | Speedup |
|---|---|---|---|
| 10-page PDF | ~2s | ~0.3s | 5-8x faster |
| 50-page PDF | ~10s | ~1s | 10x faster |
| 100+ pages | ~20s | ~2s | Linear scaling with CPU cores |
Benchmarks vary based on PDF complexity and system resources.
# Quick start - zero installation
npx @sylphx/pdf-reader-mcp
# Using pnpm (recommended)
pnpm add @sylphx/pdf-reader-mcp
# Using npm
npm install @sylphx/pdf-reader-mcp
# Using yarn
yarn add @sylphx/pdf-reader-mcp
# For Claude Desktop (easiest)
npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
Add to your MCP client (claude_desktop_config.json, Cursor, Cline):
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
{
"sources": [{
"path": "documents/report.pdf"
}],
"include_full_text": true,
"include_metadata": true,
"include_page_count": true
}
Result:
{
"sources": [{
"path": "documents/manual.pdf",
"pages": "1-5,10,15-20"
}],
"include_full_text": true
}
// Windows - Both formats work!
{
"sources": [{
"path": "C:\\Users\\John\\Documents\\report.pdf"
}],
"include_full_text": true
}
// Unix/Mac
{
"sources": [{
"path": "/home/user/documents/contract.pdf"
}],
"include_full_text": true
}
No more "Absolute paths are not allowed" errors!
{
"sources": [{
"path": "presentation.pdf",
"pages": [1, 2, 3]
}],
"include_images": true,
"include_full_text": true
}
Response includes:
{
"sources": [
{ "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
{ "url": "https://example.com/Q3.pdf" }
],
"include_full_text": true
}
⚡ All PDFs processed in parallel automatically!
// ✅ Windows
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }
// ✅ Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }
// ✅ Relative (still works)
{ "path": "documents/report.pdf" }
Other Improvements:
v1.2.0 - Content Ordering
v1.1.0 - Image Extraction & Performance
read_pdf ToolThe single tool that handles all PDF operations.
| Parameter | Type | Description | Default |
|---|---|---|---|
sources | Array | List of PDF sources to process | Required |
include_full_text | boolean | Extract full text content | false |
include_metadata | boolean | Extract PDF metadata | true |
include_page_count | boolean | Include total page count | true |
include_images | boolean | Extract embedded images | false |
{
path?: string; // Local file path (absolute or relative)
url?: string; // HTTP/HTTPS URL to PDF
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
}
Metadata only (fast):
{
"sources": [{ "path": "large.pdf" }],
"include_metadata": true,
"include_page_count": true,
"include_full_text": false
}
From URL:
{
"sources": [{
"url": "https://arxiv.org/pdf/2301.00001.pdf"
}],
"include_full_text": true
}
Page ranges:
{
"sources": [{
"path": "manual.pdf",
"pages": "1-5,10-15,20" // Pages 1,2,3,4,5,10,11,12,13,14,15,20
}]
}
Content is returned in natural reading order based on Y-coordinates:
Document Layout:
┌─────────────────────┐
│ [Title] Y:100 │
│ [Image] Y:150 │
│ [Text] Y:400 │
│ [Photo A] Y:500 │
│ [Photo B] Y:550 │
└─────────────────────┘
Response Order:
[
{ type: "text", text: "Title..." },
{ type: "image", data: "..." },
{ type: "text", text: "..." },
{ type: "image", data: "..." },
{ type: "image", data: "..." }
]
Benefits:
Enable extraction:
{
"sources": [{ "path": "manual.pdf" }],
"include_images": true
}
Response format:
{
"images": [{
"page": 1,
"index": 0,
"width": 1920,
"height": 1080,
"format": "rgb",
"data": "base64-encoded-png..."
}]
}
Supported formats: RGB, RGBA, Grayscale Auto-detected: JPEG, PNG, and other embedded formats
Absolute paths (v1.3.0+) - Direct file access:
{ "path": "C:\\Users\\John\\file.pdf" }
{ "path": "/home/user/file.pdf" }
Relative paths - Workspace files:
{ "path": "docs/report.pdf" }
{ "path": "./2024/Q1.pdf" }
Configure working directory:
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/documents"
}
}
}
Strategy 1: Page ranges
{ "sources": [{ "path": "big.pdf", "pages": "1-20" }] }
Strategy 2: Progressive loading
// Step 1: Get page count
{ "sources": [{ "path": "big.pdf" }], "include_full_text": false }
// Step 2: Extract sections
{ "sources": [{ "path": "big.pdf", "pages": "50-75" }] }
Strategy 3: Parallel batching
{
"sources": [
{ "path": "big.pdf", "pages": "1-50" },
{ "path": "big.pdf", "pages": "51-100" }
]
}
Solution: Upgrade to v1.3.0+
npm update @sylphx/pdf-reader-mcp
Restart your MCP client completely.
Causes:
Solutions:
Use absolute path:
{ "path": "C:\\Full\\Path\\file.pdf" }
Or configure cwd:
{
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/docs"
}
}
Solution:
npm cache clean --force
rm -rf node_modules package-lock.json
npm install @sylphx/pdf-reader-mcp@latest
Restart MCP client completely.
| Component | Technology |
|---|---|
| Runtime | Node.js 22+ ESM |
| PDF Engine | PDF.js (Mozilla) |
| Validation | Zod + JSON Schema |
| Protocol | MCP SDK |
| Language | TypeScript (strict) |
| Testing | Vitest (103 tests) |
| Quality | Biome (50x faster) |
| CI/CD | GitHub Actions |
any types, strict modePrerequisites:
Setup:
git clone https://github.com/SylphxAI/pdf-reader-mcp.git
cd pdf-reader-mcp
pnpm install && pnpm build
Scripts:
pnpm run build # Build TypeScript
pnpm run test # Run 103 tests
pnpm run test:cov # Coverage (94%+)
pnpm run check # Lint + format
pnpm run check:fix # Auto-fix
pnpm run benchmark # Performance tests
Quality:
Quick Start:
git checkout -b feature/awesomepnpm testpnpm run check:fixCommit Format:
feat(images): add WebP support
fix(paths): handle UNC paths
docs(readme): update examples
See CONTRIBUTING.md
✅ Completed
🚀 Next
Vote at Discussions
Featured on:
Trusted worldwide • Enterprise adoption • Battle-tested
Show Your Support: ⭐ Star • 👀 Watch • 🐛 Report bugs • 💡 Suggest features • 🔀 Contribute
103 Tests • 94%+ Coverage • Production Ready
MIT © Sylphx
Built with:
Special thanks to the open source community ❤️
5-10x faster. Production-ready. Battle-tested.
The PDF processing server that actually scales
sylphx.com •
@SylphxAI •
hi@sylphx.com
FAQs
An MCP server providing tools to read PDF files.
We found that @bachstudio/pdf-reader-mcp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.

Research
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.