You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

cerevox

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

cerevox

Official Python SDK for Cerevox

0.1.6
pipPyPI
Maintainers
1

Cerevox Logo

Cerevox - The Data Layer 🧠 ⚡

Parse documents with enterprise-grade reliability
AI-powered • Highest Accuracy • Vector DB ready

CI Status Code Coverage Maintainability PyPI version Python versions License

Official Python SDK for Lexa - Parse documents into structured data

🎯 Perfect for: RAG applications, document analysis, data extraction, and vector database preparation

📦 Installation

pip install cerevox

📋 Requirements

🚀 Quick Start

Basic Usage

from cerevox import Lexa

# Parse a document
client = Lexa(api_key="your-api-key")
documents = client.parse(["document.pdf"])

print(f"Extracted {len(documents[0].content)} characters")
print(f"Found {len(documents[0].tables)} tables")
import asyncio
from cerevox import AsyncLexa

async def main():
    async with AsyncLexa(api_key="your-api-key") as client:
        documents = await client.parse(["document.pdf", "report.docx"])
        
        # Get chunks optimized for vector databases
        chunks = documents.get_all_text_chunks(target_size=500)
        print(f"Ready for embedding: {len(chunks)} chunks")

asyncio.run(main())

✨ Features

🚀 Performance & Scale

  • 10x Faster than traditional solutions
  • Native Async Support with concurrent processing
  • Enterprise-grade reliability with automatic retries

🧠 AI-Powered Extraction

  • SOTA Accuracy with cutting-edge ML models
  • Advanced Table Extraction preserving structure and formatting
  • 12+ File Formats including PDF, DOCX, PPTX, HTML, and more

🔗 Integration Ready

  • Vector Database Optimized chunks for RAG applications
  • 7+ Cloud Storage integrations (S3, SharePoint, Google Drive, etc.)
  • Framework Agnostic works with Django, Flask, FastAPI
  • Rich Metadata extraction including images, formatting, and structure

📋 Examples

Explore comprehensive examples in the examples/ directory:

ExampleDescription
lexa_examples.pyComplete SDK functionality demonstration
vector_db_preparation.pyVector database chunking and integration patterns
async_examples.pyAdvanced async processing techniques
document_examples.pyDocument analysis and manipulation features
cloud_integrations.pyCloud storage service integrations

🚀 Run Examples

# Clone and explore
git clone https://github.com/CerevoxAI/cerevox-python.git
cd cerevox-python

export CEREVOX_API_KEY="your-api-key"

# Run demos
python examples/lexa_examples.py          # Basic usage
python examples/vector_db_preparation.py  # Vector DB integration
python examples/async_examples.py         # Async features
python examples/document_examples.py      # Document analysis
python examples/cloud_integrations.py     # Cloud Integrations Coming Soon!

📚 Documentation

📖 Guides & Tutorials

🔗 External Resources

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support & Community

📖 Resources

💬 Get Help

🐛 Issues

⭐ Star us on GitHub if Cerevox helped your project!
Made with ❤️ by the Cerevox team
Happy Parsing 🔍 ✨

Keywords

document

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts