New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

reducto-cli

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

reducto-cli

CLI for Reducto document processing

pipPyPI
Version
0.1.2
Maintainers
1

Reducto CLI

PyPI version Python 3.11+ License

A command-line tool for document parsing, structured data extraction, and document editing — powered by Reducto's document intelligence API.

Parse PDFs, images, spreadsheets, and Office documents into clean Markdown. Extract structured JSON using schemas. Edit documents with natural language instructions. Process single files or entire directories.

Documentation | Reducto Studio | API Quickstart | Python SDK | Claude Code Plugin

Table of Contents

Installation

pip install reducto-cli

Requires Python 3.11 or later.

Authentication

Authenticate using the built-in device code flow, which opens a browser to Reducto Studio:

reducto login

This saves your API key to ~/.reducto/config.yaml.

Alternatively, set the REDUCTO_API_KEY environment variable directly:

export REDUCTO_API_KEY="your_api_key_here"

Get an API key by signing up at studio.reducto.ai.

Quick Start

# Parse a PDF into Markdown
reducto parse invoice.pdf

# Parse an entire folder of documents
reducto parse ./contracts/

# Extract structured data using a JSON Schema
reducto extract invoice.pdf -s schema.json

# Edit a document with natural language
reducto edit form.pdf -i "Fill in the client name as 'Acme Corp'"

Commands

Parse Command

Converts documents into structured Markdown, preserving layout, tables, and figures. Uses Reducto's Parse API with agentic OCR and vision-language models.

reducto parse <path> [options]

Output is written to <filename>.parse.md with YAML front matter containing the job ID and processing duration.

Options

FlagDescription
--agenticEnables agentic processing for tables, text, and figures. Higher accuracy, higher latency. Use for complex layouts or low-quality scans.
--change-trackingReturns <s>, <u>, and <change> tags for strikethrough, underlined, and revised text. Useful for contracts and legal redlines.
--highlightsInclude highlighted text in output.
--hyperlinksInclude embedded hyperlinks in output.
--commentsInclude document comments in output.

Examples

# Basic parse
reducto parse document.pdf

# High-accuracy parse for complex layouts
reducto parse scanned_report.pdf --agentic

# Parse a contract with revision tracking
reducto parse contract.pdf --change-tracking

# Parse with all metadata preserved
reducto parse document.pdf --hyperlinks --comments --highlights

# Combine flags
reducto parse legal_doc.pdf --agentic --change-tracking --comments

Extract Command

Pulls structured data from documents according to a JSON Schema you provide. Maps unstructured content — invoices, receipts, forms, contracts, financial statements — into machine-readable JSON.

reducto extract <path> --schema <schema>

The schema can be a path to a .json file or an inline JSON string. Output is saved as <filename>.extract.json.

The CLI automatically reuses existing parse results: if a .parse.md file exists for a document, its recorded job ID is used via jobid:// references to skip re-parsing.

Schema Requirements

  • Must be a valid JSON Schema document.
  • The top-level type must be object — arrays and primitives are not permitted at the top level.
  • Schemas can be provided as file paths or inline JSON strings.

Example Schema

{
  "type": "object",
  "properties": {
    "vendor_name": { "type": "string" },
    "invoice_number": { "type": "string" },
    "date": { "type": "string" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "total": { "type": "number" }
        },
        "required": ["description", "quantity", "unit_price", "total"]
      }
    },
    "total_amount": { "type": "number" }
  },
  "required": ["vendor_name", "invoice_number", "line_items", "total_amount"]
}

Examples

# Extract using a schema file
reducto extract invoice.pdf -s schemas/invoice.json

# Extract from a folder of invoices
reducto extract ./invoices/ -s schemas/invoice.json

# Extract with inline JSON schema
reducto extract receipt.pdf -s '{"type":"object","properties":{"total":{"type":"number"},"date":{"type":"string"}},"required":["total","date"]}'

Edit Command

Modifies documents using natural language instructions. Uploads the document, applies edits via the Reducto Edit API, and downloads the result.

reducto edit <path> --instructions "<instructions>"

Edited files are saved as <filename>.edited.<extension> (e.g., form.pdf becomes form.edited.pdf).

ParameterRequiredDescription
pathYesPath to a file or directory.
--instructions, -iYesNatural language instructions for the edits.

Examples

# Fill out a PDF form
reducto edit application.pdf -i "Fill in: Name: Jane Smith, Date: 2025-03-15, check 'Agree to terms'"

# Update a contract
reducto edit contract.pdf -i "Fill in the client name as 'Acme Corporation' and set the effective date to January 15, 2025"

# Batch edit a folder of forms
reducto edit ./forms/ -i "Set the company name to 'Globex Inc' in all header fields"

Tips for Effective Instructions

  • Be specific about which elements to modify (headers, tables, specific fields).
  • Reference content by name or position when possible.
  • Describe the desired outcome, not the process.
  • For batch operations, write instructions that apply uniformly across all files.

Supported File Types

CategoryExtensions
PDF.pdf
Images.png, .jpg, .jpeg
Office Documents.doc, .docx, .ppt, .pptx
Spreadsheets.xls, .xlsx, .numbers

All commands accept a single file or a directory. Directories are scanned recursively and only supported file types are processed. Generated output files (.parse.md, .extract.json) are automatically excluded from processing.

Use Cases

Invoice and Receipt Processing

Parse invoices from any vendor format, then extract line items, totals, and payment details into structured JSON for your accounting pipeline.

reducto parse ./invoices/
reducto extract ./invoices/ -s schemas/invoice.json

Parse contracts with change tracking to surface redlines and revisions. Extract key clauses, dates, and party names for contract management systems.

reducto parse contract.pdf --agentic --change-tracking --comments
reducto extract contract.pdf -s schemas/contract_terms.json

Form Processing and Auto-Fill

Edit PDF and DOCX forms programmatically — fill fields, check boxes, and populate tables without manual data entry.

reducto edit onboarding_form.pdf -i "Fill in employee name: Alex Chen, start date: 2025-04-01, department: Engineering, select 'Full-time' for employment type"

Financial Statement Analysis

Extract tables and figures from bank statements, earnings reports, and tax documents into structured data for financial modeling.

reducto extract quarterly_report.pdf -s schemas/financial_statement.json

Medical and Insurance Document Processing

Parse lab reports, claims forms, and patient intake documents. Reducto is HIPAA compliant for healthcare workflows.

reducto parse lab_results.pdf --agentic
reducto extract claim_form.pdf -s schemas/insurance_claim.json

Batch Document Digitization

Convert entire folders of scanned documents, presentations, and spreadsheets into searchable Markdown for knowledge bases or RAG pipelines.

reducto parse ./legacy_docs/ --agentic

Feeding Data to LLM Pipelines

Parse documents into clean Markdown optimized for LLM consumption, then use the structured output as context for retrieval-augmented generation (RAG) systems.

# Parse into LLM-ready Markdown
reducto parse ./knowledge_base/

# Or extract specific fields for structured RAG
reducto extract ./knowledge_base/ -s schemas/document_metadata.json

How It Works

  • Upload — The CLI uploads your document to Reducto's API.
  • Process — Reducto applies agentic OCR, layout detection, and vision-language models to understand document structure.
  • Return — Parsed Markdown, extracted JSON, or edited documents are downloaded to your local filesystem.

Files within a directory are processed concurrently. Parse results are cached locally (.parse.md files with job IDs), so subsequent extract commands skip re-parsing.

Configuration

MethodDetails
Device code loginreducto login — opens browser, saves key to ~/.reducto/config.yaml
Environment variableexport REDUCTO_API_KEY="your_key" — takes precedence over saved config
Manual entryThe CLI prompts for manual key entry as a fallback

The config file is stored at ~/.reducto/config.yaml with 0600 permissions.

ProjectDescription
Reducto Python SDKFull Python client for the Reducto API (pip install reductoai)
Reducto Node.js SDKNode.js client for the Reducto API (npm install reductoai)
Reducto Go SDKGo client for the Reducto API
Reducto Claude Code PluginsOfficial Reducto plugins for Claude Code
Reducto StudioNo-code web interface for document processing

Resources

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts