You're Invited: Meet the Socket team at BSidesSF and RSAC - April 27 - May 1.RSVP
Socket
Sign inDemoInstall
Socket

pixeltable

Package Overview
Dependencies
Maintainers
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pixeltable

AI Data Infrastructure: Declarative, Multimodal, and Incremental

0.3.10
PyPI
Maintainers
2
Pixeltable

Build Multimodal AI Apps with Declarative Data Infrastructure

License PyPI - Python Version Platform Support
tests status tests status PyPI Package My Discord (1306431018890166272) Visit our Hugging Face space

Installation | Documentation | API Reference | Code Samples | Computer Vision | LLM

🔍 What is Pixeltable?

Pixeltable is a declarative data infrastructure for building multimodal AI applications, enabling incremental storage, transformation, indexing, and orchestration of your data.

  • Data Ingestion: Unified interface for all data types (images, videos, audio, documents, URLs, blob storage, structured data)
  • Data Transformation: Chunking, embedding, and processing with declarative computed columns
  • Indexing & Storage: Type-safe tables with built-in vector indexing
  • Query & Retrieval: Queries combining filtering, sorting, and similarity search
  • Inference & Generation: Integration with AI models (OpenAI, Anthropic, PyTorch, YOLOX, DETR, Together, Hugging Face and more...)

All with your custom functions (UDFs), and built-in caching, versioning, lineage tracking, and incremental computation.

💾 Installation

pip install pixeltable

Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.

💡 Getting Started

Learn how to create tables, populate them with data, and enhance them with built-in or user-defined transformations.

TopicNotebookTopicNotebook
10-Minute Tour of Pixeltable Open In Colab Tables and Data Operations Open In Colab
User-Defined Functions (UDFs) Open In Colab Object Detection Models Open In Colab
Incremental Prompt Engineering Open In GithubWorking with External Files Open In Colab
Integrating with Label Studio Visit our documentationAudio/Video Transcript Indexing Open In Colab
Multimodal Application Visit our Hugging Face SpaceDocument Indexing and RAG Open In Colab
Context-Aware Discord Bot Visit our documentationImage/Text Similarity Search Open In Colab

🧱 Code Samples

Import media data into Pixeltable (videos, images, audio...)

import pixeltable as pxt

v = pxt.create_table('external_data.videos', {'video': pxt.Video})

prefix = 's3://multimedia-commons/'
paths = [
    'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
    'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
    'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'
]
v.insert({'video': prefix + p} for p in paths)

Learn how to work with data in Pixeltable.

Object detection in images using DETR model

import pixeltable as pxt
from pixeltable.functions import huggingface

# Create a table to store data persistently
t = pxt.create_table('image', {'image': pxt.Image})

# Insert some images
prefix = 'https://upload.wikimedia.org/wikipedia/commons'
paths = [
    '/1/15/Cat_August_2010-4.jpg',
    '/e/e1/Example_of_a_Dog.jpg',
    '/thumb/b/bf/Bird_Diversity_2013.png/300px-Bird_Diversity_2013.png'
]
t.insert({'image': prefix + p} for p in paths)

# Add a computed column for image classification
t.add_computed_column(classification=huggingface.detr_for_object_detection(
    t.image,
    model_id='facebook/detr-resnet-50'
))

# Retrieve the rows where cats have been identified
t.select(animal = t.image,
         classification = t.classification.label_text[0]) \
.where(t.classification.label_text[0]=='cat').head()

Learn about computed columns and object detection: Comparing object detection models.

Extend Pixeltable's capabilities with user-defined functions

@pxt.udf
def draw_boxes(img: PIL.Image.Image, boxes: list[list[float]]) -> PIL.Image.Image:
    result = img.copy()  # Create a copy of `img`
    d = PIL.ImageDraw.Draw(result)
    for box in boxes:
        d.rectangle(box, width=3)  # Draw bounding box rectangles on the copied image
    return result

Learn more about user-defined functions: UDFs in Pixeltable.

Automate data operations with views, e.g., split documents into chunks

# In this example, the view is defined by iteration over the chunks of a DocumentSplitter
chunks_table = pxt.create_view(
    'rag_demo.chunks',
    documents_table,
    iterator=DocumentSplitter.create(
        document=documents_table.document,
        separators='token_limit', limit=300)
)

Learn how to leverage views to build your RAG workflow.

Evaluate model performance

# The computation of the mAP metric can become a query over the evaluation output
frames_view.select(mean_ap(frames_view.eval_yolox_tiny), mean_ap(frames_view.eval_yolox_m)).show()

Learn how to leverage Pixeltable for Model analytics.

Working with inference services

chat_table = pxt.create_table('together_demo.chat', {'input': pxt.String})

# The chat-completions API expects JSON-formatted input:
messages = [{'role': 'user', 'content': chat_table.input}]

# This example shows how additional parameters from the Together API can be used in Pixeltable
chat_table.add_computed_column(
    output=chat_completions(
        messages=messages,
        model='mistralai/Mixtral-8x7B-Instruct-v0.1',
        max_tokens=300,
        stop=['\n'],
        temperature=0.7,
        top_p=0.9,
        top_k=40,
        repetition_penalty=1.1,
        logprobs=1,
        echo=True
    )
)
chat_table.add_computed_column(
    response=chat_table.output.choices[0].message.content
)

# Start a conversation
chat_table.insert([
    {'input': 'How many species of felids have been classified?'},
    {'input': 'Can you make me a coffee?'}
])
chat_table.select(chat_table.input, chat_table.response).head()

Learn how to interact with inference services such as Together AI in Pixeltable.

Text and image similarity search on video frames with embedding indexes

import pixeltable as pxt
from pixeltable.functions.huggingface import clip
from pixeltable.iterators import FrameIterator
import PIL.Image

video_table = pxt.create_table('videos', {'video': pxt.Video})

video_table.insert([{'video': '/video.mp4'}])

frames_view = pxt.create_view(
    'frames', video_table, iterator=FrameIterator.create(video=video_table.video))

# Create an index on the 'frame' column that allows text and image search
frames_view.add_embedding_index('frame', embed=clip.using('openai/clip-vit-base-patch32'))

# Now we will retrieve images based on a sample image
sample_image = '/image.jpeg'
sim = frames_view.frame.similarity(sample_image)
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()

# Now we will retrieve images based on a string
sample_text = 'red truck'
sim = frames_view.frame.similarity(sample_text)
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()

Learn how to work with Embedding and Vector Indexes.

🔄 AI Stack Comparison

🎯 Computer Vision Workflows

RequirementTraditionalPixeltable
Frame Extractionffmpeg + custom codeAutomatic via FrameIterator
Object DetectionMultiple scripts + cachingSingle computed column
Video IndexingCustom pipelines + Vector DBNative similarity search
Annotation ManagementSeparate tools + custom codeLabel Studio integration
Model EvaluationCustom metrics pipelineBuilt-in mAP computation

🤖 LLM Workflows

RequirementTraditionalPixeltable
Document ChunkingTool + custom codeNative DocumentSplitter
Embedding GenerationSeparate pipeline + cachingComputed columns
Vector SearchExternal vector DBBuilt-in vector indexing
Prompt ManagementCustom tracking solutionVersion-controlled columns
Chain ManagementTool + custom codeComputed column DAGs

🎨 Multimodal Workflows

RequirementTraditionalPixeltable
Data TypesMultiple storage systemsUnified table interface
Cross-Modal SearchComplex integrationNative similarity support
Pipeline OrchestrationMultiple tools (Airflow, etc.)Single declarative interface
Asset ManagementCustom tracking systemAutomatic lineage
Quality ControlMultiple validation toolsComputed validation columns

❓ FAQ

What problems does Pixeltable solve?

Today's solutions for AI app development require extensive custom coding and infrastructure plumbing. Tracking lineage and versions between and across data transformations, models, and deployments is cumbersome. Pixeltable lets ML Engineers and Data Scientists focus on exploration, modeling, and app development without dealing with the customary data plumbing.

What does Pixeltable provide me with? Pixeltable provides:

  • Data storage and versioning
  • Combined Data and Model Lineage
  • Indexing (e.g. embedding vectors) and Data Retrieval
  • Orchestration of multimodal workloads
  • Incremental updates
  • Code is automatically production-ready

Why should you use Pixeltable?

  • It gives you transparency and reproducibility
    • All generated data is automatically recorded and versioned
    • You will never need to re-run a workload because you lost track of the input data
  • It saves you money
    • All data changes are automatically incremental
    • You never need to re-run pipelines from scratch because you’re adding data
  • It integrates with any existing Python code or libraries
    • Bring your ever-changing code and workloads
    • You choose the models, tools, and AI practices (e.g., your embedding model for a vector index); Pixeltable orchestrates the data

What is Pixeltable not providing?

  • Pixeltable is not a low-code, prescriptive AI solution. We empower you to use the best frameworks and techniques for your specific needs.
  • We do not aim to replace your existing AI toolkit, but rather enhance it by streamlining the underlying data infrastructure and orchestration.

[!TIP] Check out the Integrations section, and feel free to submit a request for additional ones.

🤝 Contributing to Pixeltable

We're excited to welcome contributions from the community! Here's how you can get involved:

🐛 Report Issues

  • Found a bug? Open an issue
  • Include steps to reproduce and environment details

💡 Submit Changes

💬 Join the Discussion

  • Have questions? Start a Discussion
  • Share your Pixeltable projects and use cases
  • Help others in the community

📝 Improve Documentation

  • Suggest examples and tutorials
  • Propose improvements

🏢 License

This library is licensed under the Apache 2.0 License.

Keywords

data-science

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts