You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

lanyocr

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

lanyocr

An OCR library for Python

0.1.12

PyPI

Maintainers: 1

LanyOCR

A general OCR to detect and recognize English texts in an image based on a combination of EasyOCR and PaddleOCR.

LanyOCR automatically merges text boxes into lines even for rotated texts.

alt text

Getting Started

Install dependencies

pip install lanyocr

Run example

PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --image_path images/example1.jpg

Faster version, a bit less accurate

PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --merge_boxes_inference true --image_path images/example1.jpg

Switch to different recognizer

PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --recognizer_name paddleocr_en_mobile --image_path images/example1.jpg

Recognize other languages

PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --recognizer_name paddleocr_french_mobile --image_path images/french_example1.jpg

Output image will be in outputs/output.jpg

Supported Languages

English: paddleocr_en_server, paddleocr_en_mobile
French: paddleocr_french_mobile
Latin: paddleocr_latin_mobile

Note: Some unicode characters cannot be visualized correctly by OpenCV, please find the text lines in the console log.

Validate accuracy

Download ICDAR 2015 dataset

bash datasets/download_icdar2015.sh

Validate accuracy

python benchmark.py

Online API

You can try LanyOCR free on RapidAPI

To Do

[x] Abstract Class/Interface for each component
    [x] LanyOcrDetector: outputs locations of text boxes        
    [x] LanyOcrMerger: merge text boxes into text lines
    [x] LanyOcrRecognizer: convert text boxes/lines into text
    [x] LanyOcrAngleClassifier: estimate the angle of a text box/line

[ ] Multi-languages support
    [X] French        
    [X] Latin
    [ ] German

[ ] Inference using multi-models to improve accuracy
    [ ] Add interface to support voting policy

[ ] Expose flags to configure each component in OCR pipeline

Known issues

[ ] Visualization step: some small texts are drawn in incorrect directions

License

This project is licensed under the MIT License.

Credits

Special thanks to authors and developers of EasyOCR and PaddleOCR projects.

Keywords

ocr

FAQs

What is lanyocr?

Is lanyocr well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

lanyocr

LanyOCR

Getting Started

Install dependencies

Run example

Validate accuracy

Online API

To Do

Known issues

License

Credits

Keywords

Related posts

Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape

Introducing License Overlays: Smarter License Management for Real-World Code