
Research
/Security News
Intercom’s npm Package Compromised in Ongoing Mini Shai-Hulud Worm Attack
Compromised intercom-client@7.0.4 npm package is tied to the ongoing Mini Shai-Hulud worm attack targeting developer and CI/CD secrets.
language-processing-tool
Advanced tools
Language Processing Tool is a Python package that processes PDFs and detects languages in documents. It supports both text-based and scanned PDFs using OCR.
You can install the package from PyPI using:
pip install language-processing-tool
To use this package in your Python script:
from language_processing_tool.process_pdfs import process_pdfs, process_single_file
# Process a single PDF file
print(process_single_file("/path/to/file.pdf"))
# Process multiple PDFs from a directory (using CSV input)
process_pdfs("/path/to/input_folder/", "pdf_files.csv", "/path/to/output_folder/")
After installing the package, you can use the CLI tool:
process-pdfs /path/to/file.pdf
process-pdfs /path/to/input_folder/ /path/to/csv_file.csv /path/to/output_folder/
The CSV file should contain a column named filename, listing the PDF filenames (without extensions). Example:
filename
document1
document2
document3
Make sure the corresponding PDFs (document1.pdf, document2.pdf, etc.) are in the specified input folder.
language_processing_tool
├── LICENSE
├── README.md
├── setup.py
├── pyproject.toml
├── requirements.txt
├── language_processing_tool
│ ├── __init__.py
│ ├── process_pdfs.py
│ ├── sourcecode.py
├── tests
│ ├── test.py
The package includes a CLI entry point:
process-pdfs <arguments>
which maps to the main() function in process_pdfs.py.
This package requires:
pytesseractlangdetectpandasPyMuPDFicecreamPillowargparseThis project is licensed under the MIT License.
FAQs
A PDF language detection and OCR tool
We found that language-processing-tool demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
Compromised intercom-client@7.0.4 npm package is tied to the ongoing Mini Shai-Hulud worm attack targeting developer and CI/CD secrets.

Research
Socket detected a malicious supply chain attack on PyPI package lightning versions 2.6.2 and 2.6.3, which execute credential-stealing malware on import.

Research
A brand-squatted TanStack npm package used postinstall scripts to steal .env files and exfiltrate developer secrets to an attacker-controlled endpoint.