
Security News
AI Agent Lands PRs in Major OSS Projects, Targets Maintainers via Cold Outreach
An AI agent is merging PRs into major OSS projects and cold-emailing maintainers to drum up more work.
gemini-ocr-cli
Advanced tools
Command-line tool for OCR processing using Google Gemini's vision capabilities. Extract text, tables, equations, and figures from PDFs and images with high accuracy.
pip install gemini-ocr-cli
pipx install gemini-ocr-cli
git clone https://github.com/r-uben/gemini-ocr-cli.git
cd gemini-ocr-cli
uv pip install -e .
The CLI automatically picks up your API key from environment variables (no configuration needed if already set):
Priority order:
--api-key CLI argument (highest priority)GEMINI_API_KEY environment variableGOOGLE_API_KEY environment variable (fallback).env file in current directory# Option 1: Set environment variable (recommended)
export GEMINI_API_KEY="your-api-key"
# Option 2: Use existing GOOGLE_API_KEY (auto-detected)
export GOOGLE_API_KEY="your-api-key"
# Option 3: Create a .env file
echo "GEMINI_API_KEY=your-api-key" > .env
# Option 4: Pass directly (not recommended for security)
gemini-ocr paper.pdf --api-key "your-api-key"
# Single file
gemini-ocr paper.pdf
# Directory
gemini-ocr ./documents/ -o ./results/
# With custom model
gemini-ocr paper.pdf --model gemini-1.5-pro
# Analyze a chart/diagram
gemini-ocr describe chart.png
# Save to file
gemini-ocr describe figure.jpg -o description.md
gemini-ocr processProcess documents and images with OCR.
Usage: gemini-ocr process [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory for results
--api-key TEXT Gemini API key
--model TEXT Model to use (default: gemini-3.0-flash)
--task [convert|extract|table] OCR task type (default: convert)
--prompt TEXT Custom prompt for OCR
--include-images/--no-images Extract embedded images (default: True)
--save-originals/--no-save-originals
Save original input images (default: True)
--add-timestamp/--no-timestamp Add timestamp to output folder
--reprocess Reprocess existing files
--env-file PATH Path to .env file
-v, --verbose Enable verbose output
gemini-ocr describeGenerate detailed descriptions of figures, charts, and diagrams.
Usage: gemini-ocr describe [OPTIONS] IMAGE_PATH
Options:
--api-key TEXT Gemini API key
--model TEXT Model to use
-o, --output PATH Output file (default: stdout)
gemini-ocr infoShow configuration and system information.
Results are saved as Markdown files with:
metadata.json tracking all processed files| Model | Speed | Quality | Cost | Recommended For |
|---|---|---|---|---|
gemini-3.0-flash | Fast | Good | Low | Default, most documents |
gemini-1.5-flash | Fast | Good | Low | Simple documents |
gemini-1.5-pro | Slower | Best | Higher | Complex layouts, equations |
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY | Google Gemini API key | Required |
GOOGLE_API_KEY | Fallback API key | - |
GEMINI_MODEL | Default model | gemini-3.0-flash |
MIT
FAQs
CLI tool for OCR processing using Google Gemini's vision capabilities
We found that gemini-ocr-cli demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
An AI agent is merging PRs into major OSS projects and cold-emailing maintainers to drum up more work.

Research
/Security News
Chrome extension CL Suite by @CLMasters neutralizes 2FA for Facebook and Meta Business accounts while exfiltrating Business Manager contact and analytics data.

Security News
After Matplotlib rejected an AI-written PR, the agent fired back with a blog post, igniting debate over AI contributions and maintainer burden.