
Research
/Security News
Contagious Interview Campaign Escalates With 67 Malicious npm Packages and New Malware Loader
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
easyocr-unstructured
Advanced tools
EasyOCR Unstructured is a powerful library for Optical Character Recognition (OCR) that can extract text from PDFS, then group the text based on proximity.
It is intended for PDF files that have text that doesn't follow the left to right top to bottom standard of document writing.
pip install easyocr-unstructured
import easyocr_unstructured
# Initialize the EasyOCR Unstructured object
easyocr = EasyocrUnstructured()
# Invoke the OCR process on your PDF file
result = easyocr.invoke('/path/to/your_pdf_file.pdf')
#result will be a list of lists containing strings
from pprint import pprint as pp
pp(result)
The output will look something like this:
[
["This is the piece of text. Nothing near it"],
["This is the second piece of text.", "This is the third piece of text that was close to the second"],
["This is the fourth piece of text. Nothing near it"],
...
]
pip install easyocr-unstructured
import easyocr_unstructured
easyocr = EasyocrUnstructured()
result = easyocr.invoke('/path/to/your_pdf_file.pdf')
No tests yet
Please do, any sensible and safe change will be added!
Kevin Fink
MIT
FAQs
Parse unstructured text from PDFs
We found that easyocr-unstructured demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.