
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
Multi-provider AI document extraction library - Extract text or structured JSON from documents using Gemini, OpenAI, Grok, or Claude
Multi-provider AI document extraction for Node.js. Extract text or structured JSON from documents using Gemini, OpenAI, Claude, Grok, or Vertex AI.
npm install ocr-ai
import { OcrAI } from 'ocr-ai';
const ocr = new OcrAI({
provider: 'gemini',
apiKey: 'YOUR_GEMINI_API_KEY',
});
const result = await ocr.extract('./invoice.png');
if (result.success) {
const text = result.content;
console.log(text);
}
import { OcrAI } from 'ocr-ai';
const ocr = new OcrAI({
provider: 'openai',
apiKey: 'YOUR_OPENAI_API_KEY',
});
const result = await ocr.extract('./document.pdf');
if (result.success) {
const text = result.content;
console.log(text);
}
You can specify a custom model for any provider:
const ocr = new OcrAI({
provider: 'gemini',
apiKey: 'YOUR_GEMINI_API_KEY',
model: 'gemini-2.0-flash', // Use a specific model
});
// Or with OpenAI
const ocrOpenAI = new OcrAI({
provider: 'openai',
apiKey: 'YOUR_OPENAI_API_KEY',
model: 'gpt-4o-mini', // Use a different model
});
Extract directly from a URL:
const result = await ocr.extract('https://example.com/invoice.png');
if (result.success) {
console.log(result.content);
}
You can provide custom instructions to guide the extraction:
const result = await ocr.extract('./receipt.png', {
prompt: 'Extract only the total amount and date from this receipt',
});
if (result.success) {
console.log(result.content);
// Output: "Total: $154.06, Date: 11/02/2019"
}
By default, extraction returns text. You can also extract structured JSON:
// Text output (default)
const textResult = await ocr.extract('./invoice.png', {
format: 'text',
});
if (textResult.success) {
console.log(textResult.content); // string
}
// JSON output with schema
const jsonResult = await ocr.extract('./invoice.png', {
format: 'json',
schema: {
invoice_number: 'string',
date: 'string',
total: 'number',
items: [{ name: 'string', quantity: 'number', price: 'number' }],
},
});
if (jsonResult.success) {
console.log(jsonResult.data); // { invoice_number: "US-001", date: "11/02/2019", total: 154.06, items: [...] }
}
The schema defines the structure of the data you want to extract. Use a simple object where keys are field names and values are types:
Basic types:
'string' - Text values'number' - Numeric values'boolean' - True/false valuesNested objects:
const schema = {
company: {
name: 'string',
address: 'string',
phone: 'string',
},
customer: {
name: 'string',
email: 'string',
},
};
Arrays:
const schema = {
// Array of objects
items: [
{
description: 'string',
quantity: 'number',
unit_price: 'number',
total: 'number',
},
],
// Simple array
tags: ['string'],
};
Complete example (invoice):
const invoiceSchema = {
invoice_number: 'string',
date: 'string',
due_date: 'string',
company: {
name: 'string',
address: 'string',
phone: 'string',
email: 'string',
},
bill_to: {
name: 'string',
address: 'string',
},
items: [
{
description: 'string',
quantity: 'number',
unit_price: 'number',
total: 'number',
},
],
subtotal: 'number',
tax: 'number',
total: 'number',
};
const result = await ocr.extract('./invoice.png', {
format: 'json',
schema: invoiceSchema,
prompt: 'Extract all invoice data from this document.',
});
You can pass model-specific parameters like temperature, max tokens, and more:
// Gemini with model config
const result = await ocr.extract('./invoice.png', {
modelConfig: {
temperature: 0.2,
maxTokens: 4096,
topP: 0.8,
topK: 40,
},
});
// OpenAI with model config
const result = await ocr.extract('./invoice.png', {
modelConfig: {
temperature: 0,
maxTokens: 2048,
topP: 1,
},
});
Available options:
| Option | Description | Supported Providers |
|---|---|---|
| temperature | Controls randomness (0.0-1.0+) | All |
| maxTokens | Maximum tokens to generate | All |
| topP | Nucleus sampling | All |
| topK | Top-k sampling | Gemini, Claude, Vertex |
| stopSequences | Stop generation at these strings | All |
Access token usage information from the metadata:
const result = await ocr.extract('./invoice.png');
if (result.success) {
console.log(result.content);
// Access metadata
console.log(result.metadata.processingTimeMs); // 2351
console.log(result.metadata.tokens?.inputTokens); // 1855
console.log(result.metadata.tokens?.outputTokens); // 260
console.log(result.metadata.tokens?.totalTokens); // 2115
}
| Provider | Default Model | Auth |
|---|---|---|
| gemini | gemini-1.5-flash | API Key |
| openai | gpt-4o | API Key |
| claude | claude-sonnet-4-20250514 | API Key |
| grok | grok-2-vision-1212 | API Key |
| vertex | gemini-2.0-flash | Google Cloud |
Note: For enterprise OCR needs, see Advanced: Vertex AI section below.
./invoice.png, ./document.pdfhttps://example.com/invoice.pngThe vertex provider enables access to Google Cloud's AI infrastructure, which is useful for enterprise scenarios requiring:
Vertex AI uses Google Cloud authentication instead of API keys:
import { OcrAI } from 'ocr-ai';
const ocr = new OcrAI({
provider: 'vertex',
vertexConfig: {
project: 'your-gcp-project-id',
location: 'us-central1',
},
});
const result = await ocr.extract('./invoice.png');
Requirements:
gcloud auth application-default login| Scenario | Recommended |
|---|---|
| Quick prototyping | Gemini (API Key) |
| Personal projects | Gemini (API Key) |
| Enterprise/production | Vertex AI |
| Data residency requirements | Vertex AI |
| High-volume processing | Vertex AI |
For specialized document processing beyond what Gemini models offer, Google Cloud provides dedicated OCR services:
Document AI - Optimized for structured documents:
Vision API - Optimized for images:
These services are separate from ocr-ai but can complement it for enterprise document pipelines.
Performance benchmarks for Gemini models extracting data from an invoice image:
| Model | Text Extraction | JSON Extraction | Best For |
|---|---|---|---|
gemini-2.0-flash-lite | 2.8s | 2.1s | High-volume processing, cost optimization |
gemini-2.5-flash-lite | 2.2s | 1.9s | Fastest option, simple documents |
gemini-2.0-flash | 3.9s | 2.9s | General purpose, good balance |
gemini-2.5-flash | 5.0s | 5.0s | Standard documents, reliable |
gemini-3-flash-preview | 12.3s | 10.6s | Complex layouts, newer capabilities |
gemini-3-pro-image-preview | 8.0s | 11.9s | Image-heavy documents |
gemini-2.5-pro | 12.6s | 5.5s | High accuracy, complex documents |
gemini-3-pro-preview | 24.8s | 13.1s | Maximum accuracy, handwritten text |
For digital documents (invoices, receipts, forms):
// Fast and cost-effective
const ocr = new OcrAI({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
model: 'gemini-2.5-flash-lite', // ~2s response time
});
For complex documents or when accuracy is critical:
// Higher accuracy, slower processing
const ocr = new OcrAI({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
model: 'gemini-2.5-pro', // Best accuracy/speed ratio
});
For handwritten documents or poor quality scans:
// Maximum accuracy for difficult documents
const ocr = new OcrAI({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
model: 'gemini-3-pro-preview', // Best for handwriting
});
| Use Case | Recommended Model |
|---|---|
| High-volume batch processing | gemini-2.5-flash-lite |
| Standard invoices/receipts | gemini-2.0-flash |
| Complex tables and layouts | gemini-2.5-pro |
| Handwritten documents | gemini-3-pro-preview |
| Poor quality scans | gemini-3-pro-preview |
| Real-time applications | gemini-2.5-flash-lite |
Performance benchmarks for OpenAI models extracting data from an invoice image:
| Model | Text Extraction | JSON Extraction | Best For |
|---|---|---|---|
gpt-4.1-nano | 4.4s | 2.4s | Fastest, cost-effective |
gpt-4.1-mini | 4.8s | 3.2s | Good balance speed/accuracy |
gpt-4.1 | 8.2s | 5.4s | High accuracy, reliable |
gpt-4o-mini | 7.2s | 5.7s | Budget-friendly |
gpt-4o | 12.3s | 10.7s | Standard high accuracy |
gpt-5.2 | 6.4s | 5.0s | Latest generation |
gpt-5-mini | 12.2s | 7.9s | GPT-5 balanced option |
gpt-5-nano | 19.9s | 16.1s | GPT-5 economy tier |
Note:
gpt-5.2-proandgpt-image-1use different API endpoints and are not currently supported.
For digital documents (invoices, receipts, forms):
// Fast and cost-effective
const ocr = new OcrAI({
provider: 'openai',
apiKey: 'YOUR_API_KEY',
model: 'gpt-4.1-nano', // ~2-4s response time
});
For complex documents or when accuracy is critical:
// Higher accuracy, reliable extraction
const ocr = new OcrAI({
provider: 'openai',
apiKey: 'YOUR_API_KEY',
model: 'gpt-4.1', // Best accuracy/speed ratio
});
For handwritten documents or poor quality scans:
// Maximum accuracy for difficult documents
const ocr = new OcrAI({
provider: 'openai',
apiKey: 'YOUR_API_KEY',
model: 'gpt-5.2', // Latest generation, best accuracy
});
| Use Case | Recommended Model |
|---|---|
| High-volume batch processing | gpt-4.1-nano |
| Standard invoices/receipts | gpt-4.1-mini |
| Complex tables and layouts | gpt-4.1 |
| Handwritten documents | gpt-5.2 |
| Poor quality scans | gpt-5.2 |
| Real-time applications | gpt-4.1-nano |
| Budget-conscious projects | gpt-4o-mini |
For users who prefer callbacks or need more control over async operations, ocr-ai provides an alternative OcrAIPromise class with additional features.
import { OcrAIPromise } from 'ocr-ai';
const ocr = new OcrAIPromise({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
});
// Using callback style
ocr.extract('./invoice.png', {}, (error, result) => {
if (error) {
console.error('Extraction failed:', error.message);
return;
}
console.log('Extracted:', result.content);
});
import { OcrAIPromise } from 'ocr-ai';
const ocr = new OcrAIPromise({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
});
// Promise chain style
ocr.extract('./invoice.png')
.then((result) => {
if (result.success) {
console.log('Content:', result.content);
}
})
.catch((error) => {
console.error('Error:', error);
});
const ocr = new OcrAIPromise({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
});
// Extract many files at once
const results = await ocr.extractMany([
'./invoice1.png',
'./invoice2.png',
'./invoice3.png',
]);
results.forEach((result, index) => {
if (result.success) {
console.log(`File ${index + 1}:`, result.content);
}
});
const ocr = new OcrAIPromise({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
});
// Each file with its own options
const results = await ocr.extractBatch([
{ source: './invoice.png', options: { format: 'json', schema: invoiceSchema } },
{ source: './receipt.png', options: { format: 'text' } },
{ source: './contract.pdf', options: { prompt: 'Extract key dates and amounts' } },
]);
const ocr = new OcrAIPromise({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
});
// Retry up to 3 times with 1 second delay between attempts
const result = await ocr.extractWithRetry(
'./invoice.png',
{ format: 'json', schema: invoiceSchema },
3, // retries
1000 // delay in ms
);
if (result.success) {
console.log('Extracted after retries:', result.data);
}
const ocrPromise = new OcrAIPromise({
provider: 'gemini',
apiKey: 'YOUR_API_KEY',
});
// Get the underlying OcrAI instance for direct access
const ocr = ocrPromise.getOcrAI();
// Use standard async/await if needed
const result = await ocr.extract('./invoice.png');
MIT
FAQs
Multi-provider AI document extraction library - Extract text or structured JSON from documents using Gemini, OpenAI, Grok, or Claude
We found that ocr-ai demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.