Security News
The Unpaid Backbone of Open Source: Solo Maintainers Face Increasing Security Demands
Solo open source maintainers face burnout and security challenges, with 60% unpaid and 60% considering quitting.
tesseract.js-core
Advanced tools
tesseract.js-core is a JavaScript library that provides core functionalities for optical character recognition (OCR) using the Tesseract OCR engine. It allows developers to extract text from images directly in the browser or in a Node.js environment.
Basic OCR
This code demonstrates how to perform basic OCR using tesseract.js-core. It initializes a worker, loads the necessary language data, and processes an image to extract text.
const TesseractCore = require('tesseract.js-core');
const { createWorker } = require('tesseract.js');
const worker = createWorker({
corePath: TesseractCore
});
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('path/to/image.png');
console.log(text);
await worker.terminate();
})();
OCR with Progress Updates
This code sample shows how to perform OCR with progress updates. The logger function is used to log progress messages to the console.
const TesseractCore = require('tesseract.js-core');
const { createWorker } = require('tesseract.js');
const worker = createWorker({
corePath: TesseractCore,
logger: m => console.log(m)
});
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('path/to/image.png');
console.log(text);
await worker.terminate();
})();
OCR with Multiple Languages
This code demonstrates how to perform OCR on an image using multiple languages (English and Spanish in this case).
const TesseractCore = require('tesseract.js-core');
const { createWorker } = require('tesseract.js');
const worker = createWorker({
corePath: TesseractCore
});
(async () => {
await worker.load();
await worker.loadLanguage('eng+spa');
await worker.initialize('eng+spa');
const { data: { text } } = await worker.recognize('path/to/image.png');
console.log(text);
await worker.terminate();
})();
ocrad.js is a JavaScript port of the OCRAD OCR engine. It is a pure JavaScript library that can be used in the browser or in Node.js. Compared to tesseract.js-core, ocrad.js is simpler and may be easier to integrate for basic OCR tasks, but it may not be as powerful or accurate as Tesseract.
node-tesseract-ocr is a Node.js wrapper for the Tesseract OCR engine. It provides a simple interface for performing OCR on images. While it offers similar functionalities to tesseract.js-core, it is specifically designed for Node.js and may not be suitable for browser environments.
Core part of tesseract.js, which compiles original tesseract from C to JavaScript WebAssembly.
As we leverage git-submodule to manage dependencies, remember to add recursive when cloning the repository:
$ git clone --recursive https://github.com/naptha/tesseract.js-core
To build tesseract-core.js by yourself, please install docker and run:
$ sh build.sh
The genreated files will be stored in root path.*
FAQs
Tesseract C++ API in Pure Javascript
The npm package tesseract.js-core receives a total of 87,928 weekly downloads. As such, tesseract.js-core popularity was classified as popular.
We found that tesseract.js-core demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Solo open source maintainers face burnout and security challenges, with 60% unpaid and 60% considering quitting.
Security News
License exceptions modify the terms of open source licenses, impacting how software can be used, modified, and distributed. Developers should be aware of the legal implications of these exceptions.
Security News
A developer is accusing Tencent of violating the GPL by modifying a Python utility and changing its license to BSD, highlighting the importance of copyleft compliance.