
Security News
GitHub Actions Pricing Whiplash: Self-Hosted Actions Billing Change Postponed
GitHub postponed a new billing model for self-hosted Actions after developer pushback, but moved forward with hosted runner price cuts on January 1.
ocrrouter
Advanced tools
A powerful Python library for converting PDFs and images to Markdown using multiple expert VLM backends
OCRRouter is a production-ready document processing library that converts PDFs and images to high-quality Markdown. It stands out with:
pip install ocrrouter
from ocrrouter import process_document
# One-liner document conversion
result = process_document(
"document.pdf",
"output/",
backend="deepseek",
openai_api_key="your-api-key"
)
print(result["markdown"])
from ocrrouter import DocumentPipeline, Settings
# Configure pipeline
settings = Settings(
backend="deepseek",
openai_base_url="https://api.example.com/v1",
openai_api_key="your-api-key",
output_mode="all" # layout + OCR
)
# Process document
pipeline = DocumentPipeline(settings=settings)
result = pipeline.process("document.pdf", "output/")
# Access results
print(f"Markdown: {result['markdown'][:100]}...")
print(f"Output directory: {result['output_dir']}")
# Async processing for better performance
result = await pipeline.aio_process("document.pdf", "output/")
# Batch processing with concurrency control
results = await pipeline.aio_process_batch(
["doc1.pdf", "doc2.pdf", "doc3.pdf"],
"output/",
session_id="batch-001"
)
Each backend is optimized for different document types:
| Backend | Layout | OCR | Best For |
|---|---|---|---|
| MinerU | ✓ | ✓ | Academic papers, complex layouts, formulas |
| DeepSeek | ✓ | ✓ | General documents, efficiency, grounding mode |
| DotsOCR | ✓ | ✓ | Flexible extraction (one-step or two-step) |
| PaddleOCR | — | ✓ | Fast OCR, multilingual support |
| Hunyuan | — | ✓ | Markdown-optimized output |
| GeneralVLM | — | ✓ | GPT-4V, Claude, Gemini, custom VLMs |
Combine the strengths of different models:
settings = Settings(
backend="composite",
layout_model="mineru", # Best layout detection
ocr_model="paddleocr", # Fast OCR extraction
)
Why use composite mode?
Control processing behavior:
# Full layout + OCR (default)
Settings(output_mode="all")
# Layout detection only
Settings(output_mode="layout_only")
# Direct OCR without layout analysis
Settings(output_mode="ocr_only")
Multiple output files for different use cases:
.md) — Human-readable converted text_layout.pdf) — Visual layout with bounding boxes_model.json) — Raw model output_middle.json) — Processed structural data_content_list.json) — Simplified flat structureExtract formulas, citations, and complex layouts from research papers and theses:
settings = Settings(
backend="mineru",
formula_enable=True,
table_merge_enable=True # Cross-page table merging
)
Parse invoices, contracts, and forms with table extraction:
settings = Settings(
backend="deepseek",
table_enable=True,
output_mode="all"
)
Batch process archives with multilingual support:
settings = Settings(
backend="composite",
layout_model="deepseek",
ocr_model="paddleocr", # Strong multilingual support
max_concurrency=10
)
Extract structured data for RAG or training:
settings = Settings(
backend="deepseek",
dump_content_list=True, # Simplified JSON for ML
dump_middle_json=True # Structured data
)
Need layout detection + OCR?
Need OCR only?
Want to optimize cost/speed?
See Backend Guide for detailed comparison.
OCRRouter uses explicit configuration (no automatic .env loading):
from ocrrouter import Settings
# Method 1: Settings object
settings = Settings(
backend="deepseek",
openai_api_key="your-key",
max_concurrency=20,
http_timeout=120,
max_retries=3
)
# Method 2: Constructor arguments
pipeline = DocumentPipeline(
backend="deepseek",
openai_api_key="your-key"
)
# Method 3: Settings with overrides
pipeline = DocumentPipeline(
settings=settings,
max_concurrency=50 # Override
)
See Configuration Guide for all available settings.
from langfuse import Langfuse
from ocrrouter import DocumentPipeline, Settings
langfuse = Langfuse(
public_key="pk-...",
secret_key="sk-...",
host="https://cloud.langfuse.com"
)
settings = Settings(backend="deepseek", openai_api_key="your-key")
pipeline = DocumentPipeline(settings=settings, langfuse=langfuse)
# Traces appear in Langfuse dashboard
result = await pipeline.aio_process("document.pdf", "output/")
settings = Settings(
backend="deepseek",
max_retries=5,
debug=True, # Save failed requests
debug_dir="./debug" # Debug output location
)
from ocrrouter import get_backend, Settings
settings = Settings(openai_api_key="your-key")
backend = get_backend("mineru", settings=settings)
# Advanced control
middle_json, model_output = await backend.analyze(pdf_bytes, image_writer)
See docs/EXAMPLES.md for comprehensive examples including:
Or check out the demo scripts in demo/:
demo/quickstart.py — Minimal exampledemo/composite_mode.py — Composite mode showcasedemo/demo.py — Comprehensive demo# From PyPI
pip install ocrrouter
# From source
git clone https://github.com/yourusername/ocrrouter.git
cd ocrrouter
pip install -e .
Contributions are welcome! See CONTRIBUTING.md for guidelines.
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.
We would like to thank the following projects for providing code and models:
Built with ❤️ for document processing needs
FAQs
A practical tool for converting PDF to Markdown
We found that ocrrouter demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
GitHub postponed a new billing model for self-hosted Actions after developer pushback, but moved forward with hosted runner price cuts on January 1.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.