Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

ocr-space-api-alt2

Package Overview
Dependencies
Maintainers
1
Versions
8
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ocr-space-api-alt2

Fork of Dennnisk's original ocr-space-api with added support for PDF uploading. Provides an easy way to send images to the ocr.space API and get the OCR Result

  • 2.0.1
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
8
increased by300%
Maintainers
1
Weekly downloads
 
Created
Source

ocr-space-api-alt2

Allow to access ORC.SPACE API to send images and get the embedded text.

More Details: https://ocr.space/ocrapi.

IMPORTANT:

The OCR is provided by OCR.SPACE. I don't have anything with them, I just want to help reworking and sharing this library.

Main changes

  1. The original library was using request, but since it's deprecated, I saw the necessity to migrate from it. Now I'm currently using axios to perform the request.

  2. Since axios doesn't support form data request, I've used query-string.

Installation

First - Register and Get your API key

Get you API key at this link. Just follow their steps.

Second - Install npm package

  npm i ocr-space-api-alt2
  yarn add ocr-space-api-alt2

Usage example

You can see and example at the folder example.

const ocrSpaceApi = require('ocr-space-api-alt2')

const options =  { 
  apikey: '<YOUR API KEY HERE>',
  filetype: 'png',
  verbose: true,
  url: `${__dirname}/loveText.jpg`
}

const getText = async () => {
  try {
    const result = await ocrSpaceApi(options)

    console.log({ result })
  } catch (error) {
    console.error(error)
  }
}

getText()

Options

The available options are an adaptation from the docs.

keyValueDescription
apikey[Required] - String
API key from OCR space API
Get you API key at this link.
url[Required] - String
Url that points to file you want to get its text from. It can be a url (starting in http), a base64 image or a local file
language[Optional] - String
Arabic=ara
Bulgarian=bul
Chinese(Simplified)=chs
Chinese(Traditional)=cht
Croatian = hrv
Czech = cze
Danish = dan
Dutch = dut
English = eng
Finnish = fin
French = fre
German = ger
Greek = gre
Hungarian = hun
Korean = kor
Italian = ita
Japanese = jpn
Polish = pol
Portuguese = por
Russian = rus
Slovenian = slv
Spanish = spa
Swedish = swe
Turkish = tur
Language used for OCR. If no language is specified, English eng is taken as default.

IMPORTANT: The language code has always 3-letters (not 2). So it is "eng" and not "en".

Engine2 has automatic Western language detection, so this value will be ignored.
isOverlayRequired[Optional] - BooleanDefault = False
If true, returns the coordinates of the bounding boxes for each word. If false, the OCR'ed text is returned only as a text block (this makes the JSON reponse smaller). Overlay data can be used, for example, to show text over the image.
filetype[Optional] - String
Available values: PDF, GIF, PNG, JPG, TIF, BMP
Overwrites the automatic file type detection based on content-type. Supported image file formats are png, jpg (jpeg), gif, tif (tiff) and bmp. For document ocr, the api supports the Adobe PDF format. Multi-page TIFF files are supported.
detectOrientation[Optional] - BooleanIf set to true, the api autorotates the image correctly and sets the TextOrientation parameter in the JSON response. If the image is not rotated, then TextOrientation=0, otherwise it is the degree of the rotation, e. g. "270".
isCreateSearchablePdf[Optional] - BooleanDefault = False
If true, API generates a searchable PDF. This parameter automatically sets isOverlayRequired = true.
isSearchablePdfHideTextLayer[Optional] - BooleanDefault = False.
If true, the text layer is hidden (not visible).
scale[Optional] - BooleanDefault = False.
If set to true, the api does some internal upscaling. This can improve the OCR result significantly, especially for low-resolution PDF scans. Note that the front page demo uses scale=true, but the API uses scale=false by default. See also this OCR forum post.
isTable[Optional] - BooleanIf set to true, the OCR logic makes sure that the parsed text result is always returned line by line. This switch is recommended for table OCR, receipt OCR, invoice processing and all other type of input documents that have a table like structure.
OCREngine[Optional] - Number
Available values: 1, 2
Engine 1 is default. See OCR Engines.
verbose[Optional] - BooleanWether or not you want the full response from de OCR API or just the text that was gotten.

Authors

  • Denis - Initial Work - Initial Documentation - dennnisk.
  • Anthony Luzquiños - Rework - AnthonyLzq.

Important

This package was not fully tested, and every contribution will be appreciated.

Keywords

FAQs

Package last updated on 25 Dec 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc