
A simple and efficient OCR-based data extraction tool for Indian PAN and
Aadhaar cards using Tesseract
OCR.
🆕 What's New in v0.1.3
- Corrected example usage in README:
print(pan_data.get_pan())
print(aadhaar_data.get_aadhaar())
- Includes all features from v0.1.2:
- Added
tesseract_cmd parameter to ExtractAadhaarData and ExtractPanData for custom Tesseract paths.
- Fixed issue with preprocessing argument (
preprocess) in child classes not being passed correctly.
(For full version history, see CHANGELOG.md)
✨ Features
- Extract PAN card data with a single function call
- Extract Aadhaar card data with a single function call
- Built-in preprocessing option for better OCR accuracy
- Cross-platform support (Windows, Linux, macOS) with configurable
Tesseract path
📦 Installation
pip install ocr-pro
🚀 Usage
from ocr import ExtractPanData
pan_data = ExtractPanData("pan_image.jpg", tesseract_cmd="/usr/bin/tesseract")
print(pan_data.get_pan())
from ocr import ExtractAadhaarData
aadhaar_data = ExtractAadhaarData("aadhaar_image.jpg", tesseract_cmd="/usr/bin/tesseract", preprocess=True)
print(aadhaar_data.get_aadhaar())
Arguments
- filepath (str) → Path to the image file
- tesseract_cmd (str, optional) → Path to the Tesseract
executable (default: system auto-detection or
"C:\Program Files\Tesseract-OCR\tesseract.exe" on Windows)
- preprocess (bool, default=False) → Whether to apply
preprocessing for better OCR results
⚙️ Requirements
📜 License
MIT License