tesseract.js-node
A focused node-only version of tesseract.js.
Why?
tesseract.js is developed for both node and browser, and includes (in my opinion) bloated functionality like automatic downloading of traineddata-files in the background.
At the time of writing, it also does not have any tests for node-environment (only browser). Example issue where this matters: https://github.com/naptha/tesseract.js/issues/339.
I just wanted a way to use Tesseract 4.0 in a node project without all this extra functionality and background downloads from third-party servers.
Usage
Download traineddata-files from somewhere, e.g. officially:
mkdir tessdata
cd tessdata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/fin.traineddata
Then use the library in a node project:
const getWorker = require('tesseract.js-node');
const worker = await getWorker({
tessdata: '/path/to/tessdata',
languages: ['eng', 'fin']
});
const text = await worker.recognize('/path/to/image', 'eng');
You can supply the input image in various ways:
const text = await worker.recognize('/path/to/image', 'eng');
const text = await worker.recognize(fs.readFileSync('/path/to/image'), 'eng');
const text = await worker.recognize(canvas.toBuffer('image/png'), 'eng');
See tesseract.test.js for other examples.
Development
npm test
Useful resources:
Credits
Thanks to tesseract.js-core contributors for the groundwork!
License
Apache License 2.0