
Security News
pnpm 11.5 Adds Support for Recognizing npm Staged Publishes
pnpm 11.5 now recognizes npm staged publish approvals in release metadata, preventing those releases from being mistaken for lower-trust package publishes.
deepseek-ocr-cli
Advanced tools
Command-line tool for OCR using DeepSeek vision models. Supports Ollama (local) and vLLM (GPU server) backends.
--dry-run to preview files before processingThis is one of five OCR CLI tools with a shared design: clean Markdown output, batch processing, and figure extraction. Pick based on your constraints:
| Tool | Engine | Runs | Cost | Best for |
|---|---|---|---|---|
| deepseek-ocr-cli (this repo) | DeepSeek vision | Local (Ollama / vLLM) | Free | General-purpose local OCR with multi-backend flexibility |
| gemini-ocr-cli | Google Gemini | Cloud API | Free tier / Pay-per-use | Fast cloud OCR with concurrent processing |
| marker-ocr-cli | Marker (Surya + Texify) | Local | Free | Academic papers with equations, tables, complex layouts |
| mistral-ocr-cli | Mistral OCR API | Cloud API | ~$1/1k pages | Structured extraction (tables, headers, footers) |
| nougat-ocr-cli | Meta Nougat | Local (GPU) | Free | Academic papers, GPU-accelerated batch processing |
deepseek-ocr model pulled in Ollama# macOS/Linux
brew install ollama
# Or download from https://ollama.ai
ollama pull deepseek-ocr
pip install deepseek-ocr-cli
# Process a single image
deepseek-ocr document.jpg
# Process a PDF
deepseek-ocr paper.pdf
# Process all files in a directory
deepseek-ocr ./documents/ --recursive
# Preview files without processing
deepseek-ocr ./documents/ --dry-run
# Custom output directory
deepseek-ocr doc.pdf -o ./results/
# Use vLLM backend
deepseek-ocr paper.pdf --backend vllm --vllm-url http://gpu-server:8000/v1
# Parallel processing for faster PDF OCR
deepseek-ocr large-document.pdf -w 2
# Extract and analyze embedded figures
deepseek-ocr paper.pdf --analyze-figures
# Quiet mode (paths only, for scripting)
deepseek-ocr paper.pdf -q
deepseek-ocr [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory for results
-r, --recursive Recursively process directories
--model TEXT Model name (default: deepseek-ocr)
--prompt TEXT Custom prompt for OCR
--task [convert|ocr|layout|extract|parse]
OCR task type
--extract-images Extract and save page images from PDFs
--no-metadata Exclude metadata from output
--dpi INTEGER PDF rendering DPI (default: 200)
-w, --workers INTEGER Parallel workers for PDF pages (default: 1)
--analyze-figures Extract and analyze embedded figures with AI
--max-dim INTEGER Max image dimension (default: 1920, 0 to disable)
--backend [ollama|vllm] Backend to use (default: ollama)
--vllm-url TEXT vLLM API URL (default: http://localhost:8000/v1)
--reprocess Force reprocessing of already-done files
--dry-run Preview files without processing
-q, --quiet Suppress output, print paths only
--verbose Enable verbose output
--help Show this message and exit.
process (default)Process documents and images with OCR. The process subcommand is optional:
deepseek-ocr document.pdf
# equivalent to
deepseek-ocr process document.pdf
infoShow system and configuration information.
deepseek-ocr info
Each document gets its own folder:
output/
└── document/
├── document.md # OCR markdown
└── figures/ # Extracted figures (if --analyze-figures)
└── page1_fig1.png
The markdown includes metadata:
---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---
## Page 1
[Extracted content...]
Batch processing saves metadata.json in the output directory. On re-run, already-processed files are skipped automatically. Use --reprocess to force reprocessing.
Create a .env file or set environment variables with DEEPSEEK_OCR_ prefix:
DEEPSEEK_OCR_BACKEND=ollama
DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_OLLAMA_URL=http://localhost:11434
DEEPSEEK_OCR_VLLM_BASE_URL=http://localhost:8000/v1
DEEPSEEK_OCR_MAX_DIMENSION=1920
DEEPSEEK_OCR_MAX_RETRIES=3
DEEPSEEK_OCR_RETRY_DELAY=1.0
from pathlib import Path
from deepseek_ocr import create_backend, OCRProcessor
backend = create_backend(backend_type="ollama", model_name="deepseek-ocr")
backend.load_model()
processor = OCRProcessor(
backend=backend,
output_dir=Path("./results"),
workers=2,
)
result = processor.process_file(Path("document.pdf"))
print(result.output_text)
processor.save_result(result)
backend.unload_model()
ollama serve
ollama pull deepseek-ocr
deepseek-ocr info
MIT License - see LICENSE for details.
FAQs
CLI tool for OCR using DeepSeek-OCR model via Ollama
We found that deepseek-ocr-cli demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
pnpm 11.5 now recognizes npm staged publish approvals in release metadata, preventing those releases from being mistaken for lower-trust package publishes.

Security News
Federal audit finds NIST lacked a plan to clear the NVD backlog, wasted funds on duplicate work, and delayed use of CISA data.

Research
/Security News
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.