
Stream, Parse, and Chat with Compressed Datasets Using LLMs
zipstream-ai is a Python package that lets you interact with .zip and .tar.gz files directly—no need to extract them manually. It integrates archive streaming, format detection, data parsing (e.g., CSV, JSON), and natural language querying with LLMs like Gemini, all through a unified interface.
Installation
Option 1: Install from PyPI (Recommended)
pip install zipstream-ai
Option 2: Install from Conda
conda install -c pranav_motarwar zipstream-ai
pip install openai typer python-dotenv google-generativeai
Note: The conda package includes core dependencies, but you'll need to install PyPI-only dependencies (openai, typer, python-dotenv, google-generativeai) separately via pip.
Features
| Archive Streaming | Stream .zip and .tar.gz files without extraction |
| Format Auto-Detection | Automatically detects file types (CSV, JSON, TXT, etc.) |
| DataFrame Integration | Parses tabular data directly into pandas DataFrames |
| LLM Querying | Ask questions about your data using Gemini (Google's LLM) |
| Modular Design | Easily extensible for new formats or models |
| Python + CLI Support | Use via command line or as a Python package |
Use Case Examples
1. Load & Explore ZIP
from zipstream_ai import ZipStreamReader
reader = ZipStreamReader("dataset.zip")
print(reader.list_files())
2. Parse CSV from ZIP
from zipstream_ai import FileParser
parser = FileParser(reader)
df = parser.load("data.csv")
print(df.head())
3. Ask Questions with Gemini
from zipstream_ai import ask
response = ask(df, "Which 3 rows have the highest 'score'?")
print(response)
Why zipstream-ai?
| Manually unzip files | Stream directly from archive |
| Write boilerplate code to parse | Built-in file parsers (CSV, JSON, etc.) |
| Switch between tools for LLMs | One-liner ask(df, question) integration |
Architecture Diagram
┌──────────────┐
│ .zip/.tar │
└────┬─────────┘
│
┌──────────▼──────────┐
│ ZipStreamReader │
└──────────┬──────────┘
│
┌────────▼────────┐
│ FileParser │────> pd.DataFrame
└────────┬────────┘
│
┌────────▼────────┐
│ ask() │────> Gemini LLM Output
└─────────────────┘