LitePali
Lightweight Document Retrieval with Vision Language Models
🚀 Features •
🧠 Model •
💻 Installation •
📘 Usage •
❓ Why LitePali •
🤝 Contributing
🏗 TODO
LitePali
LitePali is a lightweight document retrieval system I created, inspired by the ColPali model and optimized for cloud
deployment. It's designed to efficiently process and search through document images using state-of-the-art
vision-language models.
🚀Features
📦 Minimal dependencies
🖼️ Direct image processing without complex PDF parsing
🔄 Deterministic file processing
⚡ Batch processing for multiple files
☁️ Optimized for cloud environments
🧠Model
LitePali is built on the ColPali architecture, which uses Vision Language Models (VLMs) for efficient document
retrieval.
Key features include:
- Late Interaction Mechanism: Enables efficient query matching while maintaining context.
- Multi-Vector Representations: Generates fine-grained representations of text and images.
- Visual and Textual Understanding: Processes document images directly, understanding both content and layout.
- Efficient Indexing: Faster corpus indexing compared to traditional PDF parsing methods.
![ColPali Architecture](assets/colpali-architecture.png)
This approach allows LitePali to perform efficient retrieval while capturing complex document structures and content.
Inspiration
This library is inspired by byaldi, but with several key differences:
- Focus on images: LitePali works exclusively with images, allowing PDF processing to be handled separately on
CPU-only environments.
- Simplified dependencies: No need for Poppler or other PDF-related dependencies.
- Updated engine: Utilizes
colpali-engine
>=0.3.0 for improved performance. - Deterministic processing: Implements deterministic file processing for consistent results.
- Efficient batch processing: Employs batch processing when adding multiple files, enhancing performance.
- Customized functionality: Tailored for specific needs while building upon the excellent foundation laid by
byaldi.
These differences make LitePali a more streamlined and focused tool for image-based document retrieval, offering
flexibility in deployment and integration with existing PDF processing pipelines.
Installation
Install LitePali using pip:
pip install litepali
Usage
Here's a simple example of how to use LitePali:
from litepali import LitePali, ImageFile
litepali = LitePali()
litepali.add(ImageFile(
path="path/to/image1.jpg",
document_id=1,
page_id=1,
metadata={"title": "Introduction", "author": "John Doe"}
))
litepali.add(ImageFile(
path="path/to/image2.png",
document_id=1,
page_id=2,
metadata={"title": "Results", "author": "John Doe"}
))
litepali.add(ImageFile(
path="path/to/image3.jpg",
document_id=2,
page_id=1,
metadata={"title": "Abstract", "author": "Jane Smith"}
))
litepali.process()
results = litepali.search("Your query here", k=5)
for result in results:
print(f"Image: {result['image'].path}, Score: {result['score']}")
litepali.save_index("path/to/save/index")
new_litepali = LitePali()
new_litepali.load_index("path/to/save/index")
This example demonstrates how to add images, process them, perform a search, and save/load the index.
Why LitePali?
I created LitePali to address the need for a lightweight, efficient document retrieval system that could work directly
with images. By leveraging the power of vision-language models like ColPali, LitePali can understand both textual and
visual elements in documents, making it ideal for complex document retrieval tasks.
LitePali is designed to be easy to use and deploy in cloud environments, making it a great choice for researchers and
developers working on document retrieval systems.
Contributing
Contributions are welcome! Feel free to submit issues or pull requests if you have any improvements or bug fixes.
TODO
Future improvements and features planned for LitePali: