import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_GEMINI_API_KEY',
});

const result = await ocr.extract('./invoice.png');

if (result.success) {
  const text = result.content;
  console.log(text);
}

Using OpenAI

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_OPENAI_API_KEY',
});

const result = await ocr.extract('./document.pdf');

if (result.success) {
  const text = result.content;
  console.log(text);
}

Custom Model

You can specify a custom model for any provider:

const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_GEMINI_API_KEY',
  model: 'gemini-2.0-flash', // Use a specific model
});

// Or with OpenAI
const ocrOpenAI = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_OPENAI_API_KEY',
  model: 'gpt-4o-mini', // Use a different model
});

From URL

Extract directly from a URL:

const result = await ocr.extract('https://example.com/invoice.png');

if (result.success) {
  console.log(result.content);
}

Custom Instructions

You can provide custom instructions to guide the extraction:

const result = await ocr.extract('./receipt.png', {
  prompt: 'Extract only the total amount and date from this receipt',
});

if (result.success) {
  console.log(result.content);
  // Output: "Total: $154.06, Date: 11/02/2019"
}

Output Format

By default, extraction returns text. You can also extract structured JSON:

// Text output (default)
const textResult = await ocr.extract('./invoice.png', {
  format: 'text',
});

if (textResult.success) {
  console.log(textResult.content); // string
}

// JSON output with schema
const jsonResult = await ocr.extract('./invoice.png', {
  format: 'json',
  schema: {
    invoice_number: 'string',
    date: 'string',
    total: 'number',
    items: [{ name: 'string', quantity: 'number', price: 'number' }],
  },
});

if (jsonResult.success) {
  console.log(jsonResult.data); // { invoice_number: "US-001", date: "11/02/2019", total: 154.06, items: [...] }
}

JSON Schema

The schema defines the structure of the data you want to extract. Use a simple object where keys are field names and values are types:

Basic types:

'string' - Text values
'number' - Numeric values
'boolean' - True/false values

Nested objects:

const schema = {
  company: {
    name: 'string',
    address: 'string',
    phone: 'string',
  },
  customer: {
    name: 'string',
    email: 'string',
  },
};

Arrays:

const schema = {
  // Array of objects
  items: [
    {
      description: 'string',
      quantity: 'number',
      unit_price: 'number',
      total: 'number',
    },
  ],
  // Simple array
  tags: ['string'],
};

Complete example (invoice):

const invoiceSchema = {
  invoice_number: 'string',
  date: 'string',
  due_date: 'string',
  company: {
    name: 'string',
    address: 'string',
    phone: 'string',
    email: 'string',
  },
  bill_to: {
    name: 'string',
    address: 'string',
  },
  items: [
    {
      description: 'string',
      quantity: 'number',
      unit_price: 'number',
      total: 'number',
    },
  ],
  subtotal: 'number',
  tax: 'number',
  total: 'number',
};

const result = await ocr.extract('./invoice.png', {
  format: 'json',
  schema: invoiceSchema,
  prompt: 'Extract all invoice data from this document.',
});

Model Configuration

You can pass model-specific parameters like temperature, max tokens, and more:

// Gemini with model config
const result = await ocr.extract('./invoice.png', {
  modelConfig: {
    temperature: 0.2,
    maxTokens: 4096,
    topP: 0.8,
    topK: 40,
  },
});

// OpenAI with model config
const result = await ocr.extract('./invoice.png', {
  modelConfig: {
    temperature: 0,
    maxTokens: 2048,
    topP: 1,
  },
});

Available options:

Option	Description	Supported Providers
temperature	Controls randomness (0.0-1.0+)	All
maxTokens	Maximum tokens to generate	All
topP	Nucleus sampling	All
topK	Top-k sampling	Gemini, Claude, Vertex
stopSequences	Stop generation at these strings	All

Token Usage

Access token usage information from the metadata:

const result = await ocr.extract('./invoice.png');

if (result.success) {
  console.log(result.content);

  // Access metadata
  console.log(result.metadata.processingTimeMs); // 2351
  console.log(result.metadata.tokens?.inputTokens); // 1855
  console.log(result.metadata.tokens?.outputTokens); // 260
  console.log(result.metadata.tokens?.totalTokens); // 2115
}

Supported Providers

Provider	Default Model	Auth
gemini	gemini-1.5-flash	API Key
openai	gpt-4o	API Key
claude	claude-sonnet-4-20250514	API Key
grok	grok-2-vision-1212	API Key
vertex	gemini-2.0-flash	Google Cloud

Note: For enterprise OCR needs, see Advanced: Vertex AI section below.

Supported Inputs

Local files: ./invoice.png, ./document.pdf
URLs: https://example.com/invoice.png

Supported Files

Images: jpg, png, gif, webp
Documents: pdf
Text: txt, md, csv, json, xml, html

Advanced: Vertex AI (Google Cloud)

The vertex provider enables access to Google Cloud's AI infrastructure, which is useful for enterprise scenarios requiring:

Compliance: Data residency and regulatory requirements
Integration: Native integration with Google Cloud services (BigQuery, Cloud Storage, etc.)
Specialized OCR: Access to Google's Document AI and Vision AI processors

Basic Setup

Vertex AI uses Google Cloud authentication instead of API keys:

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'vertex',
  vertexConfig: {
    project: 'your-gcp-project-id',
    location: 'us-central1',
  },
});

const result = await ocr.extract('./invoice.png');

Requirements:

Install the gcloud CLI
Run gcloud auth application-default login
Enable the Vertex AI API in your GCP project

When to Use Vertex AI vs Gemini API

Scenario	Recommended
Quick prototyping	Gemini (API Key)
Personal projects	Gemini (API Key)
Enterprise/production	Vertex AI
Data residency requirements	Vertex AI
High-volume processing	Vertex AI

For specialized document processing beyond what Gemini models offer, Google Cloud provides dedicated OCR services:

Document AI - Optimized for structured documents:

Invoice Parser, Receipt Parser, Form Parser
W2, 1040, Bank Statement processors
Custom extractors for domain-specific documents
Higher accuracy for tables, forms, and handwritten text

Vision API - Optimized for images:

Real-time OCR with low latency
80+ language support
Handwriting detection
Simple integration, ~98% accuracy on clean documents

These services are separate from ocr-ai but can complement it for enterprise document pipelines.

Gemini Model Benchmarks

Performance benchmarks for Gemini models extracting data from an invoice image:

Model	Text Extraction	JSON Extraction	Best For
`gemini-2.0-flash-lite`	2.8s	2.1s	High-volume processing, cost optimization
`gemini-2.5-flash-lite`	2.2s	1.9s	Fastest option, simple documents
`gemini-2.0-flash`	3.9s	2.9s	General purpose, good balance
`gemini-2.5-flash`	5.0s	5.0s	Standard documents, reliable
`gemini-3-flash-preview`	12.3s	10.6s	Complex layouts, newer capabilities
`gemini-3-pro-image-preview`	8.0s	11.9s	Image-heavy documents
`gemini-2.5-pro`	12.6s	5.5s	High accuracy, complex documents
`gemini-3-pro-preview`	24.8s	13.1s	Maximum accuracy, handwritten text

Model Recommendations

For digital documents (invoices, receipts, forms):

// Fast and cost-effective
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-2.5-flash-lite', // ~2s response time
});

For complex documents or when accuracy is critical:

// Higher accuracy, slower processing
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-2.5-pro', // Best accuracy/speed ratio
});

For handwritten documents or poor quality scans:

// Maximum accuracy for difficult documents
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-3-pro-preview', // Best for handwriting
});

Quick Reference

Use Case	Recommended Model
High-volume batch processing	`gemini-2.5-flash-lite`
Standard invoices/receipts	`gemini-2.0-flash`
Complex tables and layouts	`gemini-2.5-pro`
Handwritten documents	`gemini-3-pro-preview`
Poor quality scans	`gemini-3-pro-preview`
Real-time applications	`gemini-2.5-flash-lite`

OpenAI Model Benchmarks

Performance benchmarks for OpenAI models extracting data from an invoice image:

Model	Text Extraction	JSON Extraction	Best For
`gpt-4.1-nano`	4.4s	2.4s	Fastest, cost-effective
`gpt-4.1-mini`	4.8s	3.2s	Good balance speed/accuracy
`gpt-4.1`	8.2s	5.4s	High accuracy, reliable
`gpt-4o-mini`	7.2s	5.7s	Budget-friendly
`gpt-4o`	12.3s	10.7s	Standard high accuracy
`gpt-5.2`	6.4s	5.0s	Latest generation
`gpt-5-mini`	12.2s	7.9s	GPT-5 balanced option
`gpt-5-nano`	19.9s	16.1s	GPT-5 economy tier

Note: gpt-5.2-pro and gpt-image-1 use different API endpoints and are not currently supported.

Model Recommendations

For digital documents (invoices, receipts, forms):

// Fast and cost-effective
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-4.1-nano', // ~2-4s response time
});

For complex documents or when accuracy is critical:

// Higher accuracy, reliable extraction
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-4.1', // Best accuracy/speed ratio
});

For handwritten documents or poor quality scans:

// Maximum accuracy for difficult documents
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-5.2', // Latest generation, best accuracy
});

Quick Reference

Use Case	Recommended Model
High-volume batch processing	`gpt-4.1-nano`
Standard invoices/receipts	`gpt-4.1-mini`
Complex tables and layouts	`gpt-4.1`
Handwritten documents	`gpt-5.2`
Poor quality scans	`gpt-5.2`
Real-time applications	`gpt-4.1-nano`
Budget-conscious projects	`gpt-4o-mini`

Promise API

For users who prefer callbacks or need more control over async operations, ocr-ai provides an alternative OcrAIPromise class with additional features.

Basic Usage with Callbacks

import { OcrAIPromise } from 'ocr-ai';

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Using callback style
ocr.extract('./invoice.png', {}, (error, result) => {
  if (error) {
    console.error('Extraction failed:', error.message);
    return;
  }
  console.log('Extracted:', result.content);
});

Using .then()/.catch()

import { OcrAIPromise } from 'ocr-ai';

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Promise chain style
ocr.extract('./invoice.png')
  .then((result) => {
    if (result.success) {
      console.log('Content:', result.content);
    }
  })
  .catch((error) => {
    console.error('Error:', error);
  });

Extract Multiple Files in Parallel

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Extract many files at once
const results = await ocr.extractMany([
  './invoice1.png',
  './invoice2.png',
  './invoice3.png',
]);

results.forEach((result, index) => {
  if (result.success) {
    console.log(`File ${index + 1}:`, result.content);
  }
});

Batch Extraction with Individual Options

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Each file with its own options
const results = await ocr.extractBatch([
  { source: './invoice.png', options: { format: 'json', schema: invoiceSchema } },
  { source: './receipt.png', options: { format: 'text' } },
  { source: './contract.pdf', options: { prompt: 'Extract key dates and amounts' } },
]);

Automatic Retry on Failure

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Retry up to 3 times with 1 second delay between attempts
const result = await ocr.extractWithRetry(
  './invoice.png',
  { format: 'json', schema: invoiceSchema },
  3,    // retries
  1000  // delay in ms
);

if (result.success) {
  console.log('Extracted after retries:', result.data);
}

Access Underlying OcrAI Instance

const ocrPromise = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Get the underlying OcrAI instance for direct access
const ocr = ocrPromise.getOcrAI();

// Use standard async/await if needed
const result = await ocr.extract('./invoice.png');

License

MIT

Keywords

FAQs

What is ocr-ai?

Is ocr-ai well maintained?

Package last updated on 09 Jan 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ocr-ai

ocr-ai

Installation

Quick Start

Using Gemini

Using OpenAI

Custom Model

From URL

Custom Instructions

Output Format

JSON Schema

Model Configuration

Token Usage

Supported Providers

Supported Inputs

Supported Files

Advanced: Vertex AI (Google Cloud)

Basic Setup

When to Use Vertex AI vs Gemini API

Related Google Cloud OCR Services

Gemini Model Benchmarks

Model Recommendations

Quick Reference

OpenAI Model Benchmarks

Model Recommendations

Quick Reference

Promise API

Basic Usage with Callbacks

Using .then()/.catch()

Extract Multiple Files in Parallel

Batch Extraction with Individual Options

Automatic Retry on Failure

Access Underlying OcrAI Instance

License

Keywords

Related posts

Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise

Node.js Drops Bug Bounty Rewards After Funding Dries Up