
Security News
TC39 Advances 11 Proposals for Math Precision, Binary APIs, and More
TC39 advances 11 JavaScript proposals, with two moving to Stage 4, bringing better math, binary APIs, and more features one step closer to the ECMAScript spec.
A general OCR to detect and recognize English texts in an image based on a combination of EasyOCR and PaddleOCR.
LanyOCR automatically merges text boxes into lines even for rotated texts.
pip install lanyocr
PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --image_path images/example1.jpg
Faster version, a bit less accurate
PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --merge_boxes_inference true --image_path images/example1.jpg
Switch to different recognizer
PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --recognizer_name paddleocr_en_mobile --image_path images/example1.jpg
Recognize other languages
PYTHONPATH=. python detect.py --merge_rotated_boxes true --merge_vertical true --recognizer_name paddleocr_french_mobile --image_path images/french_example1.jpg
Output image will be in outputs/output.jpg
Supported Languages
Note: Some unicode characters cannot be visualized correctly by OpenCV, please find the text lines in the console log.
Download ICDAR 2015 dataset
bash datasets/download_icdar2015.sh
Validate accuracy
python benchmark.py
You can try LanyOCR free on RapidAPI
[x] Abstract Class/Interface for each component
[x] LanyOcrDetector: outputs locations of text boxes
[x] LanyOcrMerger: merge text boxes into text lines
[x] LanyOcrRecognizer: convert text boxes/lines into text
[x] LanyOcrAngleClassifier: estimate the angle of a text box/line
[ ] Multi-languages support
[X] French
[X] Latin
[ ] German
[ ] Inference using multi-models to improve accuracy
[ ] Add interface to support voting policy
[ ] Expose flags to configure each component in OCR pipeline
[ ] Visualization step: some small texts are drawn in incorrect directions
This project is licensed under the MIT License.
Special thanks to authors and developers of EasyOCR and PaddleOCR projects.
FAQs
An OCR library for Python
We found that lanyocr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
TC39 advances 11 JavaScript proposals, with two moving to Stage 4, bringing better math, binary APIs, and more features one step closer to the ECMAScript spec.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.