New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

tesseract.js-node

Package Overview
Dependencies
Maintainers
1
Versions
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

tesseract.js-node

A focused node-only version of tesseract.js.

0.1.0
latest
Source
npm
Version published
Maintainers
1
Created
Source

tesseract.js-node

A focused node-only version of tesseract.js.

Why?

tesseract.js is developed for both node and browser, and includes (in my opinion) bloated functionality like automatic downloading of traineddata-files in the background.

At the time of writing, it also does not have any tests for node-environment (only browser). Example issue where this matters: https://github.com/naptha/tesseract.js/issues/339.

I just wanted a way to use Tesseract 4.0 in a node project without all this extra functionality and background downloads from third-party servers.

Usage

Download traineddata-files from somewhere, e.g. officially:

mkdir tessdata
cd tessdata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/fin.traineddata

Then use the library in a node project:

const getWorker = require('tesseract.js-node');
const worker = await getWorker({
  tessdata: '/path/to/tessdata',    // where .traineddata-files are located
  languages: ['eng', 'fin']         // languages to load
});
const text = await worker.recognize('/path/to/image', 'eng');

You can supply the input image in various ways:

// path to image
const text = await worker.recognize('/path/to/image', 'eng');
// Buffer
const text = await worker.recognize(fs.readFileSync('/path/to/image'), 'eng');
// Buffer (from node-canvas)
const text = await worker.recognize(canvas.toBuffer('image/png'), 'eng');

See tesseract.test.js for other examples.

Development

npm test

Useful resources:

Credits

Thanks to tesseract.js-core contributors for the groundwork!

License

Apache License 2.0

FAQs

Package last updated on 25 Sep 2019

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts