Socket
Socket
Sign inDemoInstall

tesseract.js

Package Overview
Dependencies
16
Maintainers
4
Versions
67
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

Comparing version 4.1.1 to 4.1.2

docs/performance.md

6

docs/api.md

@@ -66,3 +66,3 @@ # API

- FS functions // optional
- loadLanguauge
- loadLanguage
- initialize

@@ -126,3 +126,3 @@ - setParameters // optional

Worker.readFile() remove a file in MEMFS, it is useful when you want to free the memory.
Worker.removeFile() remove a file in MEMFS, it is useful when you want to free the memory.

@@ -159,3 +159,3 @@ **Arguments:**

// equal to:
// await worker.readText('tmp.txt', 'Hi\nTesseract.js\n');
// await worker.writeText('tmp.txt', 'Hi\nTesseract.js\n');
})();

@@ -162,0 +162,0 @@ ```

@@ -10,2 +10,11 @@ FAQ

# Recognizing Text
## Are PDF files supported?
Tesseract.js does not support .pdf directly—a separate library must be used to convert the .pdf files to images before Tesseract can recognize them. If you are an end user and want to use Tesseract.js to OCR a .pdf file, consider using [scribeocr.com](https://scribeocr.com/), a project that uses Tesseract.js and supports .pdf files. If you are a developer who wants to use Tesseract.js with .pdf files, you can use either of the libraries below to convert from .pdf to images.
1. [PDF.js](https://github.com/mozilla/pdf.js/) (Apache-2.0 license)
2. [muPDF](https://github.com/ArtifexSoftware/mupdf) (AGPL-3.0 license)
## What configuration settings should I use?
Default settings should provide optimal results for most users. If you do want to experiment with configuration settings, Tesseract does include many settings to change—the vast majority are documented in the [main Tesseract project](https://github.com/tesseract-ocr/tesseract) and not here. As noted above (“what is the scope of this project”), the core recognition engine is inherited from the main Tesseract project—all of the configuration settings in Tesseract work identically in Tesseract.js. Therefore, for specific questions about configuring recognition settings (e.g. “how can I make noise removal more/less aggressive” or “what settings work best for license plates”) you are more likely to find an answer in the Tesseract documentation/discussion versus only looking in this repo.
# Trained Data

@@ -12,0 +21,0 @@ ## How does tesseract.js download and keep \*.traineddata?

@@ -36,6 +36,6 @@ ## Local Installation

### corePath
A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://cdn.jsdelivr.net/npm/tesseract.js-core@v4.0.3'.
A string specifying the location of the [tesseract.js-core](https://github.com/naptha/tesseract.js-core) files, with default value 'https://cdn.jsdelivr.net/npm/tesseract.js-core@v4.0.3'.
When `corePath` is a directory rather than specific `.js` file (e.g. `https://cdn.jsdelivr.net/npm/tesseract.js-core@v4.0.3`), Tesseract.js loads either `tesseract-core-simd.wasm.js` or `tesseract-core.wasm.js` depending on whether the users' device supports SIMD (see [https://webassembly.org/roadmap/](https://webassembly.org/roadmap/)). Therefore, if self-hosting it is important that both these files are in the location you specify for `corePath`. Having multiple files is necessary as the SIMD-enabled version is *significantly* faster (for the LSTM model [the default]), however is not yet supported on all devices.
`corePath` should be set to a directory containing both `tesseract-core-simd.wasm.js` and `tesseract-core.wasm.js`. Tesseract.js will load either `tesseract-core-simd.wasm.js` or `tesseract-core.wasm.js` from the directory depending on whether the users' device supports SIMD (see [https://webassembly.org/roadmap/](https://webassembly.org/roadmap/)).
When `corePath` is set to a specific `.js` file (e.g. `https://cdn.jsdelivr.net/npm/tesseract.js-core@v4.0.3/tesseract-core.wasm.js`), it will load that file regardless of whether the users' device supports SIMD or not. This behavior exists to preserve backwards compatibility--specifying a directory that contains both files is strongly recommended. Specifying a single file will either result in much slower performance (if `tesseract-core.wasm.js` is specified) or failure to run on certain devices (if `tesseract-core-simd.wasm.js` is specified).
To avoid breaking old code, when `corePath` is set to a specific `.js` file (e.g. `https://cdn.jsdelivr.net/npm/tesseract.js-core@v4.0.3/tesseract-core.wasm.js`), it will load that file regardless of whether the users' device supports SIMD or not. This behavior only exists to preserve backwards compatibility—setting `corePath` to a specific `.js` file is strongly discouraged. Doing so will either result in much slower performance (if `tesseract-core.wasm.js` is specified) or failure to run on certain devices (if `tesseract-core-simd.wasm.js` is specified).

@@ -7,4 +7,3 @@ const { createWorker, createScheduler } = require('../../');

const workerGen = async () => {
const worker = createWorker({cachePath: "."});
await worker.load();
const worker = await createWorker({cachePath: "."});
await worker.loadLanguage('eng');

@@ -19,3 +18,3 @@ await worker.initialize('eng');

for (let i=0; i<workerN; i++) {
resArr[i] = workerGen();
resArr[i] = await workerGen();
}

@@ -22,0 +21,0 @@ await Promise.all(resArr);

{
"name": "tesseract.js",
"version": "4.1.1",
"version": "4.1.2",
"description": "Pure Javascript Multilingual OCR",

@@ -5,0 +5,0 @@ "main": "src/index.js",

@@ -12,7 +12,15 @@ const webpack = require('webpack');

app.use(cors());
app.use('/', express.static(path.resolve(__dirname, '..')));
app.use(middleware(compiler, { publicPath: '/dist', writeToDisk: true }));
// These headers are required to measure memory within the benchmark code.
// If they are problematic within other contexts they can be removed.
app.use(express.static(path.resolve(__dirname, '..'), {
setHeaders: (res) => {
res.set('Cross-Origin-Opener-Policy', 'same-origin');
res.set('Cross-Origin-Embedder-Policy', 'require-corp');
}
}));
module.exports = app.listen(3000, () => {
console.log('Server is running on the port no. 3000');
});

@@ -68,3 +68,2 @@ declare namespace Tesseract {

interface WorkerParams {
tessedit_ocr_engine_mode: OEM
tessedit_pageseg_mode: PSM

@@ -71,0 +70,0 @@ tessedit_char_whitelist: string

@@ -61,3 +61,3 @@ /**

const FS = async ({ workerId, payload: { method, args } }, res) => {
log(`[${workerId}]: FS.${method} with args ${args}`);
log(`[${workerId}]: FS.${method}`);
res.resolve(TessModule.FS[method](...args));

@@ -124,3 +124,3 @@ };

}
data = await resp.arrayBuffer();
data = new Uint8Array(await resp.arrayBuffer());

@@ -137,4 +137,2 @@ // langPath is a local file, read .traineddata from local filesystem

data = new Uint8Array(data);
// Check for gzip magic numbers (1F and 8B in hex)

@@ -166,4 +164,3 @@ const isGzip = (data[0] === 31 && data[1] === 139) || (data[1] === 31 && data[0] === 139);

}
return Promise.resolve(data);
return Promise.resolve();
};

@@ -182,2 +179,18 @@

const setParameters = async ({ payload: { params: _params } }, res) => {
// A small number of parameters can only be set at initialization.
// These can only be set using (1) the `oem` argument of `initialize` (for setting the oem)
// or (2) the `config` argument of `initialize` (for all other settings).
// Attempting to set these using this function will have no impact so a warning is printed.
// This list is generated by searching the Tesseract codebase for parameters
// defined with `[type]_INIT_MEMBER` rather than `[type]_MEMBER`.
const initParamNames = ['ambigs_debug_level', 'user_words_suffix', 'user_patterns_suffix', 'user_patterns_suffix',
'load_system_dawg', 'load_freq_dawg', 'load_unambig_dawg', 'load_punc_dawg', 'load_number_dawg', 'load_bigram_dawg',
'tessedit_ocr_engine_mode', 'tessedit_init_config_only', 'language_model_ngram_on', 'language_model_use_sigmoidal_certainty'];
const initParamStr = Object.keys(_params)
.filter((k) => initParamNames.includes(k))
.join(', ');
if (initParamStr.length > 0) console.log(`Attempted to set parameters that can only be set during initialization: ${initParamStr}`);
Object.keys(_params)

@@ -274,3 +287,3 @@ .filter((k) => !k.startsWith('tessjs_'))

const nonRecOutputs = ['imageColor', 'imageGrey', 'imageBinary', 'layoutBlocks'];
const nonRecOutputs = ['imageColor', 'imageGrey', 'imageBinary', 'layoutBlocks', 'debug'];
let recOutputCount = 0;

@@ -454,4 +467,10 @@ for (const prop of Object.keys(output)) {

const res = (status, data) => {
// Return only the necessary info to avoid sending unnecessarily large messages
const packetRes = {
jobId: packet.jobId,
workerId: packet.workerId,
action: packet.action,
};
send({
...packet,
...packetRes,
status,

@@ -458,0 +477,0 @@ data,

Sorry, the diff of this file is too big to display

Sorry, the diff of this file is too big to display

Sorry, the diff of this file is too big to display

Sorry, the diff of this file is not supported yet

Sorry, the diff of this file is not supported yet

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc