
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.
reducto-cli
Advanced tools
A command-line tool for document parsing, structured data extraction, and document editing — powered by Reducto's document intelligence API.
Parse PDFs, images, spreadsheets, and Office documents into clean Markdown. Extract structured JSON using schemas. Edit documents with natural language instructions. Process single files or entire directories.
Documentation | Reducto Studio | API Quickstart | Python SDK | Claude Code Plugin
pip install reducto-cli
Requires Python 3.11 or later.
Authenticate using the built-in device code flow, which opens a browser to Reducto Studio:
reducto login
This saves your API key to ~/.reducto/config.yaml.
Alternatively, set the REDUCTO_API_KEY environment variable directly:
export REDUCTO_API_KEY="your_api_key_here"
Get an API key by signing up at studio.reducto.ai.
# Parse a PDF into Markdown
reducto parse invoice.pdf
# Parse an entire folder of documents
reducto parse ./contracts/
# Extract structured data using a JSON Schema
reducto extract invoice.pdf -s schema.json
# Edit a document with natural language
reducto edit form.pdf -i "Fill in the client name as 'Acme Corp'"
Converts documents into structured Markdown, preserving layout, tables, and figures. Uses Reducto's Parse API with agentic OCR and vision-language models.
reducto parse <path> [options]
Output is written to <filename>.parse.md with YAML front matter containing the job ID and processing duration.
| Flag | Description |
|---|---|
--agentic | Enables agentic processing for tables, text, and figures. Higher accuracy, higher latency. Use for complex layouts or low-quality scans. |
--change-tracking | Returns <s>, <u>, and <change> tags for strikethrough, underlined, and revised text. Useful for contracts and legal redlines. |
--highlights | Include highlighted text in output. |
--hyperlinks | Include embedded hyperlinks in output. |
--comments | Include document comments in output. |
# Basic parse
reducto parse document.pdf
# High-accuracy parse for complex layouts
reducto parse scanned_report.pdf --agentic
# Parse a contract with revision tracking
reducto parse contract.pdf --change-tracking
# Parse with all metadata preserved
reducto parse document.pdf --hyperlinks --comments --highlights
# Combine flags
reducto parse legal_doc.pdf --agentic --change-tracking --comments
Pulls structured data from documents according to a JSON Schema you provide. Maps unstructured content — invoices, receipts, forms, contracts, financial statements — into machine-readable JSON.
reducto extract <path> --schema <schema>
The schema can be a path to a .json file or an inline JSON string. Output is saved as <filename>.extract.json.
The CLI automatically reuses existing parse results: if a .parse.md file exists for a document, its recorded job ID is used via jobid:// references to skip re-parsing.
object — arrays and primitives are not permitted at the top level.{
"type": "object",
"properties": {
"vendor_name": { "type": "string" },
"invoice_number": { "type": "string" },
"date": { "type": "string" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"unit_price": { "type": "number" },
"total": { "type": "number" }
},
"required": ["description", "quantity", "unit_price", "total"]
}
},
"total_amount": { "type": "number" }
},
"required": ["vendor_name", "invoice_number", "line_items", "total_amount"]
}
# Extract using a schema file
reducto extract invoice.pdf -s schemas/invoice.json
# Extract from a folder of invoices
reducto extract ./invoices/ -s schemas/invoice.json
# Extract with inline JSON schema
reducto extract receipt.pdf -s '{"type":"object","properties":{"total":{"type":"number"},"date":{"type":"string"}},"required":["total","date"]}'
Modifies documents using natural language instructions. Uploads the document, applies edits via the Reducto Edit API, and downloads the result.
reducto edit <path> --instructions "<instructions>"
Edited files are saved as <filename>.edited.<extension> (e.g., form.pdf becomes form.edited.pdf).
| Parameter | Required | Description |
|---|---|---|
path | Yes | Path to a file or directory. |
--instructions, -i | Yes | Natural language instructions for the edits. |
# Fill out a PDF form
reducto edit application.pdf -i "Fill in: Name: Jane Smith, Date: 2025-03-15, check 'Agree to terms'"
# Update a contract
reducto edit contract.pdf -i "Fill in the client name as 'Acme Corporation' and set the effective date to January 15, 2025"
# Batch edit a folder of forms
reducto edit ./forms/ -i "Set the company name to 'Globex Inc' in all header fields"
| Category | Extensions |
|---|---|
.pdf | |
| Images | .png, .jpg, .jpeg |
| Office Documents | .doc, .docx, .ppt, .pptx |
| Spreadsheets | .xls, .xlsx, .numbers |
All commands accept a single file or a directory. Directories are scanned recursively and only supported file types are processed. Generated output files (.parse.md, .extract.json) are automatically excluded from processing.
Parse invoices from any vendor format, then extract line items, totals, and payment details into structured JSON for your accounting pipeline.
reducto parse ./invoices/
reducto extract ./invoices/ -s schemas/invoice.json
Parse contracts with change tracking to surface redlines and revisions. Extract key clauses, dates, and party names for contract management systems.
reducto parse contract.pdf --agentic --change-tracking --comments
reducto extract contract.pdf -s schemas/contract_terms.json
Edit PDF and DOCX forms programmatically — fill fields, check boxes, and populate tables without manual data entry.
reducto edit onboarding_form.pdf -i "Fill in employee name: Alex Chen, start date: 2025-04-01, department: Engineering, select 'Full-time' for employment type"
Extract tables and figures from bank statements, earnings reports, and tax documents into structured data for financial modeling.
reducto extract quarterly_report.pdf -s schemas/financial_statement.json
Parse lab reports, claims forms, and patient intake documents. Reducto is HIPAA compliant for healthcare workflows.
reducto parse lab_results.pdf --agentic
reducto extract claim_form.pdf -s schemas/insurance_claim.json
Convert entire folders of scanned documents, presentations, and spreadsheets into searchable Markdown for knowledge bases or RAG pipelines.
reducto parse ./legacy_docs/ --agentic
Parse documents into clean Markdown optimized for LLM consumption, then use the structured output as context for retrieval-augmented generation (RAG) systems.
# Parse into LLM-ready Markdown
reducto parse ./knowledge_base/
# Or extract specific fields for structured RAG
reducto extract ./knowledge_base/ -s schemas/document_metadata.json
Files within a directory are processed concurrently. Parse results are cached locally (.parse.md files with job IDs), so subsequent extract commands skip re-parsing.
| Method | Details |
|---|---|
| Device code login | reducto login — opens browser, saves key to ~/.reducto/config.yaml |
| Environment variable | export REDUCTO_API_KEY="your_key" — takes precedence over saved config |
| Manual entry | The CLI prompts for manual key entry as a fallback |
The config file is stored at ~/.reducto/config.yaml with 0600 permissions.
| Project | Description |
|---|---|
| Reducto Python SDK | Full Python client for the Reducto API (pip install reductoai) |
| Reducto Node.js SDK | Node.js client for the Reducto API (npm install reductoai) |
| Reducto Go SDK | Go client for the Reducto API |
| Reducto Claude Code Plugins | Official Reducto plugins for Claude Code |
| Reducto Studio | No-code web interface for document processing |
FAQs
CLI for Reducto document processing
We found that reducto-cli demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.