ocrrouter

Package Overview

Dependencies

Maintainers

Versions

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

ocrrouter

A practical tool for converting PDF to Markdown

PyPI

Version: 0.1.3

Maintainers: 1

OCRRouter

A powerful Python library for converting PDFs and images to Markdown using multiple expert VLM backends

What is OCRRouter?

OCRRouter is a production-ready document processing library that converts PDFs and images to high-quality Markdown. It stands out with:

6 Expert VLM Backends — Choose from MinerU, DeepSeek-OCR, DotsOCR, PaddleOCR, Hunyuan-OCR, or GeneralVLM (GPT/Claude/Gemini)
Composite Mode — Mix layout detection from one model with OCR from another for optimal results (unique feature!)
Rich Document Support — Tables, formulas, images, code blocks, lists, and complex layouts
Flexible APIs — Sync/async, single/batch processing, multiple output formats
Production Ready — Built-in observability (Langfuse), retries, error handling, debug mode

Quick Start

Installation

pip install ocrrouter

30-Second Example

from ocrrouter import process_document

# One-liner document conversion
result = process_document(
    "document.pdf",
    "output/",
    backend="deepseek",
    openai_api_key="your-api-key"
)

print(result["markdown"])

Basic Usage

from ocrrouter import DocumentPipeline, Settings

# Configure pipeline
settings = Settings(
    backend="deepseek",
    openai_base_url="https://api.example.com/v1",
    openai_api_key="your-api-key",
    output_mode="all"  # layout + OCR
)

# Process document
pipeline = DocumentPipeline(settings=settings)
result = pipeline.process("document.pdf", "output/")

# Access results
print(f"Markdown: {result['markdown'][:100]}...")
print(f"Output directory: {result['output_dir']}")

Async Processing

# Async processing for better performance
result = await pipeline.aio_process("document.pdf", "output/")

# Batch processing with concurrency control
results = await pipeline.aio_process_batch(
    ["doc1.pdf", "doc2.pdf", "doc3.pdf"],
    "output/",
    session_id="batch-001"
)

Key Features

1. Multiple Expert Backends

Each backend is optimized for different document types:

Backend	Layout	OCR	Best For
MinerU	✓	✓	Academic papers, complex layouts, formulas
DeepSeek	✓	✓	General documents, efficiency, grounding mode
DotsOCR	✓	✓	Flexible extraction (one-step or two-step)
PaddleOCR	—	✓	Fast OCR, multilingual support
Hunyuan	—	✓	Markdown-optimized output
GeneralVLM	—	✓	GPT-4V, Claude, Gemini, custom VLMs

2. Composite Mode (Mix & Match)

Combine the strengths of different models:

settings = Settings(
    backend="composite",
    layout_model="mineru",      # Best layout detection
    ocr_model="paddleocr",      # Fast OCR extraction
)

Why use composite mode?

Optimize for cost vs quality
Leverage each model's strengths
Example: MinerU's excellent layout + PaddleOCR's speed
2-3x faster than single-model approaches in many cases

3. Three Output Modes

Control processing behavior:

# Full layout + OCR (default)
Settings(output_mode="all")

# Layout detection only
Settings(output_mode="layout_only")

# Direct OCR without layout analysis
Settings(output_mode="ocr_only")

4. Rich Output Formats

Multiple output files for different use cases:

Markdown (.md) — Human-readable converted text
Layout PDF (_layout.pdf) — Visual layout with bounding boxes
Model JSON (_model.json) — Raw model output
Middle JSON (_middle.json) — Processed structural data
Content List (_content_list.json) — Simplified flat structure
Images — Extracted figures, tables, equations

Use Cases

Academic Research

Extract formulas, citations, and complex layouts from research papers and theses:

settings = Settings(
    backend="mineru",
    formula_enable=True,
    table_merge_enable=True  # Cross-page table merging
)

Business Documents

Parse invoices, contracts, and forms with table extraction:

settings = Settings(
    backend="deepseek",
    table_enable=True,
    output_mode="all"
)

Document Digitization

Batch process archives with multilingual support:

settings = Settings(
    backend="composite",
    layout_model="deepseek",
    ocr_model="paddleocr",  # Strong multilingual support
    max_concurrency=10
)

AI/ML Pipelines

Extract structured data for RAG or training:

settings = Settings(
    backend="deepseek",
    dump_content_list=True,  # Simplified JSON for ML
    dump_middle_json=True     # Structured data
)

Backend Selection Guide

How to Choose?

Need layout detection + OCR?

Academic/Scientific → MinerU (best formula extraction)
General documents → DeepSeek (efficient grounding mode)
Flexible extraction → DotsOCR (one-step or two-step)

Need OCR only?

Fast processing → PaddleOCR
Markdown-focused → Hunyuan
Use GPT-4/Claude → GeneralVLM

Want to optimize cost/speed?

Use Composite Mode: strong layout + fast OCR

See Backend Guide for detailed comparison.

Documentation

Backend Guide — Model comparison and selection
Examples — Code examples and recipes
API Reference — Complete API documentation
Configuration — Settings and environment variables
Output Formats — Understanding output files

Configuration

OCRRouter uses explicit configuration (no automatic .env loading):

from ocrrouter import Settings

# Method 1: Settings object
settings = Settings(
    backend="deepseek",
    openai_api_key="your-key",
    max_concurrency=20,
    http_timeout=120,
    max_retries=3
)

# Method 2: Constructor arguments
pipeline = DocumentPipeline(
    backend="deepseek",
    openai_api_key="your-key"
)

# Method 3: Settings with overrides
pipeline = DocumentPipeline(
    settings=settings,
    max_concurrency=50  # Override
)

See Configuration Guide for all available settings.

Advanced Features

Observability with Langfuse

from langfuse import Langfuse
from ocrrouter import DocumentPipeline, Settings

langfuse = Langfuse(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://cloud.langfuse.com"
)

settings = Settings(backend="deepseek", openai_api_key="your-key")
pipeline = DocumentPipeline(settings=settings, langfuse=langfuse)

# Traces appear in Langfuse dashboard
result = await pipeline.aio_process("document.pdf", "output/")

Error Handling & Debug Mode

settings = Settings(
    backend="deepseek",
    max_retries=5,
    debug=True,           # Save failed requests
    debug_dir="./debug"   # Debug output location
)

Direct Backend Access

from ocrrouter import get_backend, Settings

settings = Settings(openai_api_key="your-key")
backend = get_backend("mineru", settings=settings)

# Advanced control
middle_json, model_output = await backend.analyze(pdf_bytes, image_writer)

Examples

See docs/EXAMPLES.md for comprehensive examples including:

Basic document processing
Batch processing with concurrency
Composite mode configurations
FastAPI integration
Custom pipelines
Use case-specific recipes

Or check out the demo scripts in demo/:

demo/quickstart.py — Minimal example
demo/composite_mode.py — Composite mode showcase
demo/demo.py — Comprehensive demo

Requirements

Python 3.10, 3.11, 3.12, or 3.13
VLM server access (for backends requiring API calls)
See pyproject.toml for full dependency list

Installation

# From PyPI
pip install ocrrouter

# From source
git clone https://github.com/yourusername/ocrrouter.git
cd ocrrouter
pip install -e .

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.

Support

Issues: Report bugs and request features via GitHub Issues
Documentation: Full documentation at docs/
Examples: See docs/EXAMPLES.md and demo/

Acknowledgments

We would like to thank the following projects for providing code and models:

Built with ❤️ for document processing needs

Keywords

FAQs

What is ocrrouter?

Is ocrrouter well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ocrrouter

OCRRouter

What is OCRRouter?

Quick Start

Installation

30-Second Example

Basic Usage

Async Processing

Key Features

1. Multiple Expert Backends

2. Composite Mode (Mix & Match)

3. Three Output Modes

4. Rich Output Formats

Use Cases

Academic Research

Business Documents

Document Digitization

AI/ML Pipelines

Backend Selection Guide

How to Choose?

Documentation

Configuration

Advanced Features

Observability with Langfuse

Error Handling & Debug Mode

Direct Backend Access

Examples

Requirements

Installation

Contributing

License

Support

Acknowledgments

Keywords

Related posts

2025 Report: Destructive Malware in Open Source Packages

Engineering with AI Podcast: The Promise of AI-First Development