You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

ocraccuracyreporter

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ocraccuracyreporter

OCR Accuracy Reporter

0.0.5
pipPyPI
Maintainers
1

============ Overview

Your OCR pipeline may have various stages and may use various tools. You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.

========= Usage

pip install ocraccuracyreporter from ocraccuracyreporter.oar import oar

.. topic:: initialising the reporter

oreport = oar(expected='john', given='joh', label='name')

print(oreport) name,john,joh,86,100,86,86,94,1

or you may have various ocr results for the same item, so you may want to initialise the expected alone with or without a label

oreport = oar(expected='john', label='name') oreport.given = 'joh' repr(oreoprt) if you are creating a csv report with header info label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance name,john,joh,86,100,86,86,94,1

.. topic:: Items in the report

ratio - uses pure Levenshtein Distance based matching (100 - means perfect match)

partial_ratio - matches based on best substrings

token_sort_ratio - tokenizes the strings and sorts them alphabetically

token_set_ratio - tokenizes the strings and compared the intersection

jaro_winkler - this algorithm giving more weight to common prefix (for example, some parts are good, missing others)

distance - this shows how many characters are really different in given compared to expected

========= Class variables

label - a meaningful name for the ocr string. expected - expected result given - result you got out of ocr pipeline

total_expected_char_count - calculated expected char count total_expected_word_count - calculated expected word count

total_given_char_count - calculated given char count total_given_word_count - calculated given word count

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.