
Security News
VulnCon 2025: NVD Scraps Industry Consortium Plan, Raising Questions About Reform
At VulnCon 2025, NIST scrapped its NVD consortium plans, admitted it can't keep up with CVEs, and outlined automation efforts amid a mounting backlog.
Installation | Documentation | API Reference | Code Samples | Computer Vision | LLM
Pixeltable is a declarative data infrastructure for building multimodal AI applications, enabling incremental storage, transformation, indexing, and orchestration of your data.
All with your custom functions (UDFs), and built-in caching, versioning, lineage tracking, and incremental computation.
pip install pixeltable
Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.
Learn how to create tables, populate them with data, and enhance them with built-in or user-defined transformations.
import pixeltable as pxt
v = pxt.create_table('external_data.videos', {'video': pxt.Video})
prefix = 's3://multimedia-commons/'
paths = [
'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'
]
v.insert({'video': prefix + p} for p in paths)
Learn how to work with data in Pixeltable.
import pixeltable as pxt
from pixeltable.functions import huggingface
# Create a table to store data persistently
t = pxt.create_table('image', {'image': pxt.Image})
# Insert some images
prefix = 'https://upload.wikimedia.org/wikipedia/commons'
paths = [
'/1/15/Cat_August_2010-4.jpg',
'/e/e1/Example_of_a_Dog.jpg',
'/thumb/b/bf/Bird_Diversity_2013.png/300px-Bird_Diversity_2013.png'
]
t.insert({'image': prefix + p} for p in paths)
# Add a computed column for image classification
t.add_computed_column(classification=huggingface.detr_for_object_detection(
t.image,
model_id='facebook/detr-resnet-50'
))
# Retrieve the rows where cats have been identified
t.select(animal = t.image,
classification = t.classification.label_text[0]) \
.where(t.classification.label_text[0]=='cat').head()
Learn about computed columns and object detection: Comparing object detection models.
@pxt.udf
def draw_boxes(img: PIL.Image.Image, boxes: list[list[float]]) -> PIL.Image.Image:
result = img.copy() # Create a copy of `img`
d = PIL.ImageDraw.Draw(result)
for box in boxes:
d.rectangle(box, width=3) # Draw bounding box rectangles on the copied image
return result
Learn more about user-defined functions: UDFs in Pixeltable.
# In this example, the view is defined by iteration over the chunks of a DocumentSplitter
chunks_table = pxt.create_view(
'rag_demo.chunks',
documents_table,
iterator=DocumentSplitter.create(
document=documents_table.document,
separators='token_limit', limit=300)
)
Learn how to leverage views to build your RAG workflow.
# The computation of the mAP metric can become a query over the evaluation output
frames_view.select(mean_ap(frames_view.eval_yolox_tiny), mean_ap(frames_view.eval_yolox_m)).show()
Learn how to leverage Pixeltable for Model analytics.
chat_table = pxt.create_table('together_demo.chat', {'input': pxt.String})
# The chat-completions API expects JSON-formatted input:
messages = [{'role': 'user', 'content': chat_table.input}]
# This example shows how additional parameters from the Together API can be used in Pixeltable
chat_table.add_computed_column(
output=chat_completions(
messages=messages,
model='mistralai/Mixtral-8x7B-Instruct-v0.1',
max_tokens=300,
stop=['\n'],
temperature=0.7,
top_p=0.9,
top_k=40,
repetition_penalty=1.1,
logprobs=1,
echo=True
)
)
chat_table.add_computed_column(
response=chat_table.output.choices[0].message.content
)
# Start a conversation
chat_table.insert([
{'input': 'How many species of felids have been classified?'},
{'input': 'Can you make me a coffee?'}
])
chat_table.select(chat_table.input, chat_table.response).head()
Learn how to interact with inference services such as Together AI in Pixeltable.
import pixeltable as pxt
from pixeltable.functions.huggingface import clip
from pixeltable.iterators import FrameIterator
import PIL.Image
video_table = pxt.create_table('videos', {'video': pxt.Video})
video_table.insert([{'video': '/video.mp4'}])
frames_view = pxt.create_view(
'frames', video_table, iterator=FrameIterator.create(video=video_table.video))
# Create an index on the 'frame' column that allows text and image search
frames_view.add_embedding_index('frame', embed=clip.using('openai/clip-vit-base-patch32'))
# Now we will retrieve images based on a sample image
sample_image = '/image.jpeg'
sim = frames_view.frame.similarity(sample_image)
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()
# Now we will retrieve images based on a string
sample_text = 'red truck'
sim = frames_view.frame.similarity(sample_text)
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()
Learn how to work with Embedding and Vector Indexes.
Requirement | Traditional | Pixeltable |
---|---|---|
Frame Extraction | ffmpeg + custom code | Automatic via FrameIterator |
Object Detection | Multiple scripts + caching | Single computed column |
Video Indexing | Custom pipelines + Vector DB | Native similarity search |
Annotation Management | Separate tools + custom code | Label Studio integration |
Model Evaluation | Custom metrics pipeline | Built-in mAP computation |
Requirement | Traditional | Pixeltable |
---|---|---|
Document Chunking | Tool + custom code | Native DocumentSplitter |
Embedding Generation | Separate pipeline + caching | Computed columns |
Vector Search | External vector DB | Built-in vector indexing |
Prompt Management | Custom tracking solution | Version-controlled columns |
Chain Management | Tool + custom code | Computed column DAGs |
Requirement | Traditional | Pixeltable |
---|---|---|
Data Types | Multiple storage systems | Unified table interface |
Cross-Modal Search | Complex integration | Native similarity support |
Pipeline Orchestration | Multiple tools (Airflow, etc.) | Single declarative interface |
Asset Management | Custom tracking system | Automatic lineage |
Quality Control | Multiple validation tools | Computed validation columns |
Today's solutions for AI app development require extensive custom coding and infrastructure plumbing. Tracking lineage and versions between and across data transformations, models, and deployments is cumbersome. Pixeltable lets ML Engineers and Data Scientists focus on exploration, modeling, and app development without dealing with the customary data plumbing.
[!TIP] Check out the Integrations section, and feel free to submit a request for additional ones.
We're excited to welcome contributions from the community! Here's how you can get involved:
This library is licensed under the Apache 2.0 License.
FAQs
AI Data Infrastructure: Declarative, Multimodal, and Incremental
We found that pixeltable demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
At VulnCon 2025, NIST scrapped its NVD consortium plans, admitted it can't keep up with CVEs, and outlined automation efforts amid a mounting backlog.
Product
We redesigned our GitHub PR comments to deliver clear, actionable security insights without adding noise to your workflow.
Product
Our redesigned Repositories page adds alert severity, filtering, and tabs for faster triage and clearer insights across all your projects.