llm-document-ocr
Sponsored by Mercoa, the API for BillPay and Invoicing. Everything you need to launch accounts payable in your product with a single API!
LLM Based OCR and Document Parsing for Node.js. Uses GPT4 and Claude3 for OCR and data extraction.
- Converts PDFs (including multi page PDFs) into PNGs for use with GPT4
- Automatically crops white-space to create smaller inputs
- Cleans up JSON string returned by the LLM and converts it to an JSON object
- Custom prompt support for capturing any data you need
Supports:
- ✅ PNG
- ✅ WEBP
- ✅ JPEG / JPG
- ✅ GIF
- ✅ PDF
- ✅ Multi-page PDF
- ❌ DOC
- ❌ DOCX
Installation
npm i --save llm-document-ocr
yarn add llm-document-ocr
Note: If you are deploying via Docker, see the Dockerfile for an example Alpine base image.
Usage
import { DocumentOcr, prompts } from "llm-document-ocr";
const documentOcr = new DocumentOcr({
apiKey: 'YOUR-OpenAi/Anthropic-API-KEY'
model: "gpt-4o",
standardFontDataUrl: "https://unpkg.com/pdfjs-dist@3.2.146/standard_fonts/",
});
const documentData = await documentOcr.process({
model: "gpt-4o",
document: 'JVBERi0xLjMNCiXi48/TDQoNCjEgMCBvYmoNCjw8DQ...',
mimeType: 'application/pdf',
prompt: 'invoiceStartDate, invoiceEndDate, amount',
pageOptions: 'FIRST_AND_LAST'
})
Prompts
Prompts will be automatically prefixed to tell the LLM to return JSON. You will need to specify the data you wish to extract, and the LLM will return a JSON object with those keys.
For example, the prompt we use at Mercoa for invoice processing is the following:
`invoice number, invoice amount, currency (as ISO 4217 code), dueDate, invoiceDate, serviceStartDate, serviceEndDate,
vendor's [name, email with @, website],
line items [amnt, price, qty, des, name, cur (as ISO 4217 code)]`;
And this returns a JSON object that looks like:
{
invoiceNumber?: string | number
invoiceAmount?: string | number
currency?: string
dueDate?: string
invoiceDate?: string
serviceStartDate?: string
serviceEndDate?: string
vendor: {
name?: string
email?: string
website?: string
}
lineItems: Array<{
des?: string
qty?: string | number
price?: string | number
amnt?: string | number
name?: string
cur?: string
}>
}
Issues and Contributing
If you encounter a bug or want to see something added/changed, please go ahead
and
open an issue
If you wish to contribute to the library, thanks! Please see the CONTRIBUTING guide for more details.
License
MIT © Mercoa, Inc