
Security News
Bun 1.2.19 Adds Isolated Installs for Better Monorepo Support
Bun 1.2.19 introduces isolated installs for smoother monorepo workflows, along with performance boosts, new tooling, and key compatibility fixes.
A powerful MCP server for comprehensive PDF processing with OCR and diagram detection
A powerful Model Context Protocol (MCP) server that provides comprehensive PDF processing capabilities, including text extraction, OCR support, and network diagram detection. Designed to work seamlessly with Amazon Q Developer CLI and other MCP-compatible systems.
The easiest way to use this MCP server is with uvx
:
# Run directly without installation
uvx pdf-reader-mcp # your mcp client can now connect using both sse and stdio transport
# Or install globally
uvx install pdf-reader-mcp
pip install pdf-reader-mcp
git clone https://github.com/zixma13/pdf-reader-mcp.git
cd pdf-reader-mcp
pip install -e .
Install Tesseract OCR (required for OCR functionality):
# For macOS
brew install tesseract
brew install tesseract-lang # For language support including Thai
# For Ubuntu/Debian
# sudo apt-get install tesseract-ocr
# sudo apt-get install tesseract-ocr-tha # For Thai language support
Install Python dependencies:
pip install -r requirements.txt
Ensure you have the virtual environment activated:
source .venv/bin/activate
Test the server:
mcp dev main.py
Add the following to your ~/.aws/amazonq/mcp.json
file:
{
"mcpServers": {
"pdf_reader": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/pdf_reader/pdf_reader",
"run",
"main.py"
]
}
}
}
{
"mcpServers": {
"pdf_reader": {
"command": "uvx",
"timeout": 60000,
"args": [
"pdf-reader-mcp"
]
}
}
}
Once configured, you can use the PDF reader tools in Amazon Q:
To analyze a PDF and determine if it's a scanned image or searchable text:
pdf_reader___analyze_pdf("/path/to/document.pdf")
To intelligently extract content from a PDF (automatically choosing between OCR and text extraction):
pdf_reader___smart_extract_pdf("/path/to/document.pdf")
To intelligently convert a PDF to markdown:
pdf_reader___smart_pdf_to_markdown("/path/to/document.pdf")
To extract text from a PDF:
pdf_reader___read_pdf("/path/to/document.pdf")
To get metadata from a PDF:
pdf_reader___get_pdf_metadata("/path/to/document.pdf")
To extract text from a specific page (0-indexed):
pdf_reader___extract_pdf_page("/path/to/document.pdf", 0)
To extract text using OCR (supports Thai and English):
pdf_reader___ocr_pdf("/path/to/document.pdf")
To extract text from a specific page using OCR:
pdf_reader___ocr_pdf_page("/path/to/document.pdf", 0)
To convert PDF to markdown format:
pdf_reader___pdf_to_markdown("/path/to/document.pdf")
To convert PDF to markdown format using OCR:
pdf_reader___pdf_to_markdown("/path/to/document.pdf", use_ocr=True)
FAQs
A powerful MCP server for comprehensive PDF processing with OCR and diagram detection
We found that pdf-reader-mcp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Bun 1.2.19 introduces isolated installs for smoother monorepo workflows, along with performance boosts, new tooling, and key compatibility fixes.
Security News
Popular npm packages like eslint-config-prettier were compromised after a phishing attack stole a maintainer’s token, spreading malicious updates.
Security News
/Research
A phishing attack targeted developers using a typosquatted npm domain (npnjs.com) to steal credentials via fake login pages - watch out for similar scams.