Ollama Starter Document OCR
Simple CLI for extracting text from images and PDFs using Ollama.
Prerequisites
Usage
npx ollama-starter-document-ocr ./receipts/scan1.png ./receipts/scan2.jpg
npx ollama-starter-document-ocr ./receipts
npx ollama-starter-document-ocr ./receipts/statement.pdf
npx ollama-starter-document-ocr ./receipts --out-dir ./output
npx ollama-starter-document-ocr ./receipts --model deepseek-ocr
Output
- Each image has a corresponding
.txt file with the extracted text.
- For PDFs: each page is rendered to an image and then processed
- A JSON file is written to the output directory with the full results of every image/page
- Some models will detect text bounding boxes and annotate the images with them
Environment
Use OLLAMA_HOST if your Ollama server is not on the default http://localhost:11434.
OLLAMA_HOST=http://localhost:11444 npx ollama-starter-document-ocr ./receipts