
Security News
Deno 2.6 + Socket: Supply Chain Defense In Your CLI
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.
pandera-unified-validator
Advanced tools
Advanced data validation library unifying Pydantic and Pandera with multi-backend support
Advanced data validation library unifying Pydantic and Pandera with multi-backend support for pandas and Polars.
pip install pandera-unified-validator
With optional dependencies:
# For Parquet support
pip install pandera-unified-validator[parquet]
# For database validation
pip install pandera-unified-validator[database]
# For data profiling
pip install pandera-unified-validator[profiling]
# All features
pip install pandera-unified-validator[all]
import pandas as pd
from pandera_unified_validator import SchemaBuilder, UnifiedValidator
# Define schema with fluent API
schema = (
SchemaBuilder("user_schema")
.add_column("user_id", int, unique=True, ge=0)
.add_column("email", str, pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
.add_column("age", int, ge=0, le=120)
.add_column("score", float, ge=0.0, le=100.0)
.build()
)
# Create validator with auto-fix enabled
validator = UnifiedValidator(schema.to_validation_schema(), auto_fix=True)
# Validate your data
data = pd.DataFrame({
"user_id": [1, 2, 3],
"email": ["user@example.com", "invalid-email", "admin@test.org"],
"age": [25, 150, 30], # 150 is out of range
"score": [85.5, 92.0, 78.5]
})
result = validator.validate(data)
# Check results
print(f"Valid: {result.is_valid}")
print(f"Errors: {len(result.errors)}")
print(f"Suggestions: {len(result.suggestions)}")
# Generate beautiful reports
from pandera_unified_validator import ValidationReporter
reporter = ValidationReporter(result)
reporter.to_console(verbose=True) # Rich console output
reporter.to_html("report.html") # Interactive HTML report
reporter.to_json("report.json") # JSON export
| Feature | pandera-unified-validator | Pydantic | Pandera |
|---|---|---|---|
| Record validation | ✅ | ✅ | ❌ |
| DataFrame validation | ✅ | ❌ | ✅ |
| Unified schema | ✅ | ❌ | ❌ |
| Multi-backend (pandas/Polars) | ✅ | ❌ | ❌ |
| Streaming validation | ✅ | ❌ | ❌ |
| Auto-fix suggestions | ✅ | ❌ | ❌ |
| Data profiling | ✅ | ❌ | ✅ |
| HTML/JSON reports | ✅ | ❌ | ❌ |
| Metrics export | ✅ | ❌ | ❌ |
from pandera_unified_validator import UnifiedValidator, SchemaBuilder, ValidationReporter
# Define comprehensive product schema
schema = (
SchemaBuilder("product_catalog")
.add_column("product_id", str, unique=True, pattern=r"^PRD-\d{6}$")
.add_column("name", str, nullable=False)
.add_column("price", float, ge=0.01, le=1_000_000)
.add_column("category", str, isin=["Electronics", "Clothing", "Books", "Home"])
.add_column("stock_quantity", int, ge=0)
.add_column("supplier_id", str, pattern=r"^SUP-\d{4}$")
.add_cross_column_constraint(
"price_check",
["price", "category"],
lambda df: df["price"] < 10000 if df["category"] == "Books" else True,
error_message="Books must be priced under $10,000"
)
.build()
)
# Validate with auto-fix
validator = UnifiedValidator(schema.to_validation_schema(), auto_fix=True)
result = validator.validate(products_df)
# Generate comprehensive report
reporter = ValidationReporter(result)
reporter.to_console(verbose=True)
reporter.to_html("validation_report.html")
# Apply auto-fixes
if result.suggestions:
fixed_df = validator.apply_fixes(products_df, result)
print(f"Fixed {len(result.suggestions)} issues automatically")
from pandera_unified_validator import StreamingValidator
# Validate large CSV without loading into memory
schema = SchemaBuilder("transactions").add_column("amount", float, ge=0).build()
validator = StreamingValidator(schema, chunk_size=10000, error_threshold=0.05)
# Async validation with progress callback
async def progress_callback(metrics):
print(f"Processed {metrics.total_rows} rows, {metrics.error_rate:.2%} error rate")
result = await validator.validate_csv(
"large_transactions.csv",
report_callback=progress_callback
)
print(f"Total rows: {result.metrics.total_rows}")
print(f"Invalid rows: {result.metrics.invalid_rows}")
print(f"Processing time: {result.metrics.processing_time:.2f}s")
# Clone repository
git clone https://github.com/ianpinto/pandera-unified-validator.git
cd pandera-unified-validator
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v --cov=src/pandera_unified_validator
# Run linting
ruff check src/ tests/
# Run type checking
mypy src/
# Run formatting
black src/ tests/
# Run all checks
ruff check src/ && black --check src/ && mypy src/ && pytest
Contributions are welcome! Please read our Contributing Guide for details on:
This project is licensed under the MIT License - see the LICENSE file for details.
If you use pandera-unified-validator in your research or production systems, please cite:
@software{pandera_unified_validator,
title = {pandera-unified-validator: Advanced data validation library},
author = {Ian Pinto},
year = {2024},
url = {https://github.com/ianpinto/pandera-unified-validator}
}
FAQs
Advanced data validation library unifying Pydantic and Pandera with multi-backend support
We found that pandera-unified-validator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.

Security News
New DoS and source code exposure bugs in React Server Components and Next.js: what’s affected and how to update safely.

Security News
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.