New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

ocr-ai

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ocr-ai

Multi-provider AI document extraction library - Extract text or structured JSON from documents using Gemini, OpenAI, Grok, or Claude

latest
Source
npmnpm
Version
1.0.4
Version published
Maintainers
1
Created
Source

ocr-ai

Multi-provider AI document extraction for Node.js. Extract text or structured JSON from documents using Gemini, OpenAI, Claude, Grok, or Vertex AI.

Installation

npm install ocr-ai

Quick Start

Using Gemini

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_GEMINI_API_KEY',
});

const result = await ocr.extract('./invoice.png');

if (result.success) {
  const text = result.content;
  console.log(text);
}

Using OpenAI

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_OPENAI_API_KEY',
});

const result = await ocr.extract('./document.pdf');

if (result.success) {
  const text = result.content;
  console.log(text);
}

Custom Model

You can specify a custom model for any provider:

const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_GEMINI_API_KEY',
  model: 'gemini-2.0-flash', // Use a specific model
});

// Or with OpenAI
const ocrOpenAI = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_OPENAI_API_KEY',
  model: 'gpt-4o-mini', // Use a different model
});

From URL

Extract directly from a URL:

const result = await ocr.extract('https://example.com/invoice.png');

if (result.success) {
  console.log(result.content);
}

Custom Instructions

You can provide custom instructions to guide the extraction:

const result = await ocr.extract('./receipt.png', {
  prompt: 'Extract only the total amount and date from this receipt',
});

if (result.success) {
  console.log(result.content);
  // Output: "Total: $154.06, Date: 11/02/2019"
}

Output Format

By default, extraction returns text. You can also extract structured JSON:

// Text output (default)
const textResult = await ocr.extract('./invoice.png', {
  format: 'text',
});

if (textResult.success) {
  console.log(textResult.content); // string
}

// JSON output with schema
const jsonResult = await ocr.extract('./invoice.png', {
  format: 'json',
  schema: {
    invoice_number: 'string',
    date: 'string',
    total: 'number',
    items: [{ name: 'string', quantity: 'number', price: 'number' }],
  },
});

if (jsonResult.success) {
  console.log(jsonResult.data); // { invoice_number: "US-001", date: "11/02/2019", total: 154.06, items: [...] }
}

JSON Schema

The schema defines the structure of the data you want to extract. Use a simple object where keys are field names and values are types:

Basic types:

  • 'string' - Text values
  • 'number' - Numeric values
  • 'boolean' - True/false values

Nested objects:

const schema = {
  company: {
    name: 'string',
    address: 'string',
    phone: 'string',
  },
  customer: {
    name: 'string',
    email: 'string',
  },
};

Arrays:

const schema = {
  // Array of objects
  items: [
    {
      description: 'string',
      quantity: 'number',
      unit_price: 'number',
      total: 'number',
    },
  ],
  // Simple array
  tags: ['string'],
};

Complete example (invoice):

const invoiceSchema = {
  invoice_number: 'string',
  date: 'string',
  due_date: 'string',
  company: {
    name: 'string',
    address: 'string',
    phone: 'string',
    email: 'string',
  },
  bill_to: {
    name: 'string',
    address: 'string',
  },
  items: [
    {
      description: 'string',
      quantity: 'number',
      unit_price: 'number',
      total: 'number',
    },
  ],
  subtotal: 'number',
  tax: 'number',
  total: 'number',
};

const result = await ocr.extract('./invoice.png', {
  format: 'json',
  schema: invoiceSchema,
  prompt: 'Extract all invoice data from this document.',
});

Model Configuration

You can pass model-specific parameters like temperature, max tokens, and more:

// Gemini with model config
const result = await ocr.extract('./invoice.png', {
  modelConfig: {
    temperature: 0.2,
    maxTokens: 4096,
    topP: 0.8,
    topK: 40,
  },
});

// OpenAI with model config
const result = await ocr.extract('./invoice.png', {
  modelConfig: {
    temperature: 0,
    maxTokens: 2048,
    topP: 1,
  },
});

Available options:

OptionDescriptionSupported Providers
temperatureControls randomness (0.0-1.0+)All
maxTokensMaximum tokens to generateAll
topPNucleus samplingAll
topKTop-k samplingGemini, Claude, Vertex
stopSequencesStop generation at these stringsAll

Token Usage

Access token usage information from the metadata:

const result = await ocr.extract('./invoice.png');

if (result.success) {
  console.log(result.content);

  // Access metadata
  console.log(result.metadata.processingTimeMs); // 2351
  console.log(result.metadata.tokens?.inputTokens); // 1855
  console.log(result.metadata.tokens?.outputTokens); // 260
  console.log(result.metadata.tokens?.totalTokens); // 2115
}

Supported Providers

ProviderDefault ModelAuth
geminigemini-1.5-flashAPI Key
openaigpt-4oAPI Key
claudeclaude-sonnet-4-20250514API Key
grokgrok-2-vision-1212API Key
vertexgemini-2.0-flashGoogle Cloud

Note: For enterprise OCR needs, see Advanced: Vertex AI section below.

Supported Inputs

  • Local files: ./invoice.png, ./document.pdf
  • URLs: https://example.com/invoice.png

Supported Files

  • Images: jpg, png, gif, webp
  • Documents: pdf
  • Text: txt, md, csv, json, xml, html

Advanced: Vertex AI (Google Cloud)

The vertex provider enables access to Google Cloud's AI infrastructure, which is useful for enterprise scenarios requiring:

  • Compliance: Data residency and regulatory requirements
  • Integration: Native integration with Google Cloud services (BigQuery, Cloud Storage, etc.)
  • Specialized OCR: Access to Google's Document AI and Vision AI processors

Basic Setup

Vertex AI uses Google Cloud authentication instead of API keys:

import { OcrAI } from 'ocr-ai';

const ocr = new OcrAI({
  provider: 'vertex',
  vertexConfig: {
    project: 'your-gcp-project-id',
    location: 'us-central1',
  },
});

const result = await ocr.extract('./invoice.png');

Requirements:

  • Install the gcloud CLI
  • Run gcloud auth application-default login
  • Enable the Vertex AI API in your GCP project

When to Use Vertex AI vs Gemini API

ScenarioRecommended
Quick prototypingGemini (API Key)
Personal projectsGemini (API Key)
Enterprise/productionVertex AI
Data residency requirementsVertex AI
High-volume processingVertex AI

For specialized document processing beyond what Gemini models offer, Google Cloud provides dedicated OCR services:

Document AI - Optimized for structured documents:

  • Invoice Parser, Receipt Parser, Form Parser
  • W2, 1040, Bank Statement processors
  • Custom extractors for domain-specific documents
  • Higher accuracy for tables, forms, and handwritten text

Vision API - Optimized for images:

  • Real-time OCR with low latency
  • 80+ language support
  • Handwriting detection
  • Simple integration, ~98% accuracy on clean documents

These services are separate from ocr-ai but can complement it for enterprise document pipelines.

Gemini Model Benchmarks

Performance benchmarks for Gemini models extracting data from an invoice image:

ModelText ExtractionJSON ExtractionBest For
gemini-2.0-flash-lite2.8s2.1sHigh-volume processing, cost optimization
gemini-2.5-flash-lite2.2s1.9sFastest option, simple documents
gemini-2.0-flash3.9s2.9sGeneral purpose, good balance
gemini-2.5-flash5.0s5.0sStandard documents, reliable
gemini-3-flash-preview12.3s10.6sComplex layouts, newer capabilities
gemini-3-pro-image-preview8.0s11.9sImage-heavy documents
gemini-2.5-pro12.6s5.5sHigh accuracy, complex documents
gemini-3-pro-preview24.8s13.1sMaximum accuracy, handwritten text

Model Recommendations

For digital documents (invoices, receipts, forms):

// Fast and cost-effective
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-2.5-flash-lite', // ~2s response time
});

For complex documents or when accuracy is critical:

// Higher accuracy, slower processing
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-2.5-pro', // Best accuracy/speed ratio
});

For handwritten documents or poor quality scans:

// Maximum accuracy for difficult documents
const ocr = new OcrAI({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-3-pro-preview', // Best for handwriting
});

Quick Reference

Use CaseRecommended Model
High-volume batch processinggemini-2.5-flash-lite
Standard invoices/receiptsgemini-2.0-flash
Complex tables and layoutsgemini-2.5-pro
Handwritten documentsgemini-3-pro-preview
Poor quality scansgemini-3-pro-preview
Real-time applicationsgemini-2.5-flash-lite

OpenAI Model Benchmarks

Performance benchmarks for OpenAI models extracting data from an invoice image:

ModelText ExtractionJSON ExtractionBest For
gpt-4.1-nano4.4s2.4sFastest, cost-effective
gpt-4.1-mini4.8s3.2sGood balance speed/accuracy
gpt-4.18.2s5.4sHigh accuracy, reliable
gpt-4o-mini7.2s5.7sBudget-friendly
gpt-4o12.3s10.7sStandard high accuracy
gpt-5.26.4s5.0sLatest generation
gpt-5-mini12.2s7.9sGPT-5 balanced option
gpt-5-nano19.9s16.1sGPT-5 economy tier

Note: gpt-5.2-pro and gpt-image-1 use different API endpoints and are not currently supported.

Model Recommendations

For digital documents (invoices, receipts, forms):

// Fast and cost-effective
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-4.1-nano', // ~2-4s response time
});

For complex documents or when accuracy is critical:

// Higher accuracy, reliable extraction
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-4.1', // Best accuracy/speed ratio
});

For handwritten documents or poor quality scans:

// Maximum accuracy for difficult documents
const ocr = new OcrAI({
  provider: 'openai',
  apiKey: 'YOUR_API_KEY',
  model: 'gpt-5.2', // Latest generation, best accuracy
});

Quick Reference

Use CaseRecommended Model
High-volume batch processinggpt-4.1-nano
Standard invoices/receiptsgpt-4.1-mini
Complex tables and layoutsgpt-4.1
Handwritten documentsgpt-5.2
Poor quality scansgpt-5.2
Real-time applicationsgpt-4.1-nano
Budget-conscious projectsgpt-4o-mini

Promise API

For users who prefer callbacks or need more control over async operations, ocr-ai provides an alternative OcrAIPromise class with additional features.

Basic Usage with Callbacks

import { OcrAIPromise } from 'ocr-ai';

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Using callback style
ocr.extract('./invoice.png', {}, (error, result) => {
  if (error) {
    console.error('Extraction failed:', error.message);
    return;
  }
  console.log('Extracted:', result.content);
});

Using .then()/.catch()

import { OcrAIPromise } from 'ocr-ai';

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Promise chain style
ocr.extract('./invoice.png')
  .then((result) => {
    if (result.success) {
      console.log('Content:', result.content);
    }
  })
  .catch((error) => {
    console.error('Error:', error);
  });

Extract Multiple Files in Parallel

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Extract many files at once
const results = await ocr.extractMany([
  './invoice1.png',
  './invoice2.png',
  './invoice3.png',
]);

results.forEach((result, index) => {
  if (result.success) {
    console.log(`File ${index + 1}:`, result.content);
  }
});

Batch Extraction with Individual Options

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Each file with its own options
const results = await ocr.extractBatch([
  { source: './invoice.png', options: { format: 'json', schema: invoiceSchema } },
  { source: './receipt.png', options: { format: 'text' } },
  { source: './contract.pdf', options: { prompt: 'Extract key dates and amounts' } },
]);

Automatic Retry on Failure

const ocr = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Retry up to 3 times with 1 second delay between attempts
const result = await ocr.extractWithRetry(
  './invoice.png',
  { format: 'json', schema: invoiceSchema },
  3,    // retries
  1000  // delay in ms
);

if (result.success) {
  console.log('Extracted after retries:', result.data);
}

Access Underlying OcrAI Instance

const ocrPromise = new OcrAIPromise({
  provider: 'gemini',
  apiKey: 'YOUR_API_KEY',
});

// Get the underlying OcrAI instance for direct access
const ocr = ocrPromise.getOcrAI();

// Use standard async/await if needed
const result = await ocr.extract('./invoice.png');

License

MIT

Keywords

ai

FAQs

Package last updated on 09 Jan 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts