PyCharter
Dynamically generate Pydantic models from JSON schemas with coercion and validation support

PyCharter is a powerful Python library that automatically converts JSON schemas into fully-functional Pydantic models. It fully supports the JSON Schema Draft 2020-12 standard, including all standard validation keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.), while also providing extensions for pre-validation coercion and post-validation checks. It handles nested objects, arrays, and custom validators, with all validation logic stored as data (not Python code). PyCharter also provides a complete data contract management system with versioning, metadata storage, and runtime validation capabilities.
β¨ Features
- π Dynamic Model Generation - Convert JSON schemas to Pydantic models at runtime
- π JSON Schema Compliant - Full support for JSON Schema Draft 2020-12 standard
- π Type Coercion - Automatic type conversion before validation (e.g., string β integer)
- β
Custom Validators - Built-in and extensible validation rules
- ποΈ Nested Structures - Full support for nested objects and arrays
- π¦ Multiple Input Formats - Load schemas from dicts, JSON strings, files, or URLs
- π― Type Safe - Full type hints and Pydantic v2 compatibility
- π§ Extensible - Register custom coercion and validation functions
- π Data-Driven - All validation logic stored as JSON data, not Python code
π¦ Installation
Core Library
pip install pycharter
With API Support
pip install pycharter[api]
This installs FastAPI and Uvicorn for running the REST API server.
With UI Support
pip install pycharter[ui]
This installs the Python dependencies and pre-built UI static files (like Airflow).
After installation, you can immediately start the UI:
pycharter ui serve
For development (if you have the source code):
cd ui
npm install
pycharter ui dev
Note: When installed from pip, the UI works immediately without Node.js. For development, Node.js is required. See ui/INSTALLATION.md for detailed instructions.
π Quick Start
from pycharter import from_dict
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string"}
},
"required": ["name", "age"]
}
Person = from_dict(schema, "Person")
person = Person(name="Alice", age=30, email="alice@example.com")
print(person.name)
print(person.age)
ποΈ Core Services & Data Production Journey
PyCharter provides six core services that work together to support a complete data production journey, from contract specification to runtime validation. Each service plays a critical role in managing data contracts and ensuring data quality throughout your pipeline.
The Data Production Journey
The typical data production workflow follows this path:
1. Data Contract Specification
β
2. Contract Parsing
β
3. Metadata Storage
β
4. Pydantic Model Generation
β
5. Runtime Validation
1. π Contract Parser (pycharter.contract_parser)
Purpose: Reads and decomposes data contract files into structured metadata components.
When to Use: At the beginning of your data production journey, when you have data contract files (YAML or JSON) that need to be processed and understood.
How It Works:
- Accepts data contract files containing schema definitions, governance rules, ownership information, and metadata
- Decomposes the contract into distinct components:
schema, governance_rules, ownership, and metadata
- Returns a
ContractMetadata object that separates concerns and makes each component accessible
- Extracts and tracks versions of all components
Example:
from pycharter import parse_contract_file, ContractMetadata
metadata = parse_contract_file("data_contract.yaml")
schema = metadata.schema
governance = metadata.governance_rules
ownership = metadata.ownership
metadata_info = metadata.metadata
versions = metadata.versions
Contribution to Journey: The contract parser is the entry point that takes raw contract specifications and prepares them for downstream processing. It ensures that contracts are properly structured and that all components (schema, governance, ownership) are separated for independent handling.
1b. ποΈ Contract Builder (pycharter.contract_builder)
Purpose: Constructs consolidated data contracts from separate artifacts (schema, coercion rules, validation rules, metadata).
When to Use: When you have separate artifacts stored independently and need to combine them into a single consolidated contract for runtime validation or distribution.
How It Works:
- Takes separate artifacts (schema, coercion rules, validation rules, metadata, ownership, governance rules)
- Merges coercion and validation rules into the schema
- Tracks versions of all components
- Produces a consolidated contract suitable for runtime validation
- Can build from artifacts directly or retrieve from metadata store
Example:
from pycharter import build_contract, build_contract_from_store, ContractArtifacts
artifacts = ContractArtifacts(
schema={"type": "object", "version": "1.0.0", "properties": {...}},
coercion_rules={"version": "1.0.0", "rules": {"age": "coerce_to_integer"}},
validation_rules={"version": "1.0.0", "rules": {"age": {"is_positive": {...}}}},
metadata={"version": "1.0.0", "description": "User contract"},
ownership={"owner": "data-team", "team": "engineering"},
)
contract = build_contract(artifacts)
contract = build_contract_from_store(store, "user_schema_v1")
from pycharter import validate_with_contract
result = validate_with_contract(contract, {"name": "Alice", "age": "30"})
Contribution to Journey: The contract builder is the consolidation layer that combines separate artifacts (stored independently in the database) into a single contract artifact. This consolidated contract tracks all component versions and can be used for runtime validation, distribution, or archival purposes.
2. πΎ Metadata Store Client (pycharter.metadata_store)
Purpose: Manages persistent storage and retrieval of decomposed metadata in databases.
When to Use: After parsing contracts, when you need to store metadata components (schemas, governance rules, ownership) in a database for versioning, querying, and governance.
How It Works:
- Provides methods to store and retrieve schemas, governance rules, ownership information, and metadata
- Supports versioning and querying of stored metadata
- Multiple implementations available: PostgreSQL, MongoDB, Redis, and In-Memory (for testing)
Available Implementations:
- PostgresMetadataStore - For PostgreSQL databases (recommended for production)
- MongoDBMetadataStore - For MongoDB databases
- RedisMetadataStore - For Redis databases
- InMemoryMetadataStore - For testing and development (no persistence)
Example:
from pycharter import PostgresMetadataStore, parse_contract_file
metadata = parse_contract_file("contract.yaml")
store = PostgresMetadataStore(connection_string="postgresql://user:pass@localhost:5432/pycharter")
store.connect()
schema_id = store.store_schema("user_schema", metadata.schema, version="1.0")
metadata_dict = {
"business_owners": ["data-team@example.com"],
"governance_rules": {"pii_rule": {"type": "encrypt"}}
}
store.store_metadata(schema_id, metadata_dict, "schema")
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")
stored_schema = store.get_schema(schema_id)
coercion_rules = store.get_coercion_rules(schema_id)
validation_rules = store.get_validation_rules(schema_id)
Contribution to Journey: The metadata store is the persistence layer that ensures contracts and their components are versioned, searchable, and accessible across your organization. It enables governance, audit trails, and schema evolution tracking.
See Configuration Guide for database setup and initialization instructions.
3. π Pydantic Generator (pycharter.pydantic_generator)
Purpose: Dynamically generates fully-functional Pydantic models from JSON Schema definitions.
When to Use: After storing schemas (or directly from parsed contracts), when you need to generate Python models for type-safe data validation and processing.
How It Works:
- Takes JSON Schema definitions (from contracts or metadata store)
- Programmatically generates Pydantic model classes at runtime
- Supports all JSON Schema Draft 2020-12 features plus custom coercions and validations
- Can generate models from dictionaries, JSON strings, files, or URLs
- Optionally generates Python files with model definitions
Example:
from pycharter import from_dict, generate_model_file, MetadataStoreClient
metadata = parse_contract_file("contract.yaml")
UserModel = from_dict(metadata.schema, "User")
client = MetadataStoreClient(...)
schema = client.get_schema("user_schema_v1")
UserModel = from_dict(schema, "User")
generate_model_file(schema, "user_model.py", "User")
Contribution to Journey: The Pydantic generator is the transformation engine that converts declarative JSON Schema definitions into executable Python models. It bridges the gap between contract specifications (data) and runtime validation (code), enabling type-safe data processing.
4. π JSON Schema Converter (pycharter.json_schema_converter)
Purpose: Converts existing Pydantic models back into JSON Schema format (reverse conversion).
When to Use: When you have existing Pydantic models and need to generate JSON Schema definitions, or when you want to round-trip between schemas and models.
How It Works:
- Takes Pydantic model classes as input
- Generates JSON Schema dictionaries that represent the model structure
- Preserves validation rules, types, and constraints
- Can output to dictionaries, JSON strings, or files
Example:
from pycharter import to_dict, to_file, to_json
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
in_stock: bool = True
schema = to_dict(Product)
json_string = to_json(Product)
to_file(Product, "product_schema.json")
ProductModel = from_dict(schema, "Product")
Contribution to Journey: The JSON Schema converter enables bidirectional conversion between models and schemas. It's useful for:
- Generating schemas from existing code
- Round-trip validation (schema β model β schema)
- Integrating with systems that require JSON Schema format
- Documenting existing models as schemas
5. β
Runtime Validator (pycharter.runtime_validator)
Purpose: Lightweight validation utility for validating data against generated Pydantic models in production data pipelines.
When to Use: In your data processing scripts, ETL pipelines, API endpoints, or any place where you need to validate incoming data against contract specifications.
How It Works:
- Takes a Pydantic model (generated from a schema) and raw data
- Validates data against the model's constraints
- Returns a
ValidationResult with validation status, validated data, and errors
- Supports single record and batch validation
- Can be used in strict mode (raises exceptions) or lenient mode (returns results)
Two Validation Modes:
Example - Database-Backed:
from pycharter import validate_with_store, InMemoryMetadataStore
store = InMemoryMetadataStore()
store.connect()
result = validate_with_store(store, "user_schema_v1", {"name": "Alice", "age": 30})
if result.is_valid:
print(f"Valid user: {result.data.name}")
Example - Contract-Based (No Database):
from pycharter import validate_with_contract, get_model_from_contract, validate
result = validate_with_contract(
"data/examples/book/book_contract.yaml",
{"isbn": "1234567890", "title": "Book", ...}
)
BookModel = get_model_from_contract("book_contract.yaml")
result1 = validate(BookModel, data1)
result2 = validate(BookModel, data2)
contract = {
"schema": {"type": "object", "properties": {...}},
"coercion_rules": {"rules": {...}},
"validation_rules": {"rules": {...}}
}
result = validate_with_contract(contract, data)
Contribution to Journey: The runtime validator is the enforcement layer that ensures data quality in production. It validates actual data against contract specifications, catching violations early and preventing bad data from propagating through your systems. It supports both database-backed workflows (for production systems with metadata stores) and contract-based workflows (for simpler use cases without database dependencies).
Complete Workflow Example
Here's how all six services work together in a complete data production journey:
from pycharter import (
parse_contract_file,
PostgresMetadataStore,
from_dict,
validate,
to_dict
)
metadata = parse_contract_file("user_contract.yaml")
store = PostgresMetadataStore(connection_string="postgresql://user:pass@localhost:5432/pycharter")
store.connect()
schema_id = store.store_schema("user", metadata.schema, version="1.0")
metadata_dict = {
"business_owners": ["data-team@example.com"],
"governance_rules": {"pii_rule": {"type": "encrypt"}}
}
store.store_metadata(schema_id, metadata_dict, "schema")
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")
schema_doc = to_dict(UserModel)
def process_user_data(raw_data):
result = validate(UserModel, raw_data)
if result.is_valid:
return result.data
else:
raise ValueError(f"Invalid data: {result.errors}")
6. π REST API (api/)
Purpose: Expose all PyCharter services as REST API endpoints.
When to Use: When you need to use PyCharter from non-Python applications, microservices, or want to provide a web-based interface.
How It Works:
- Provides HTTP endpoints for all core services
- Uses FastAPI for automatic OpenAPI/Swagger documentation
- Supports both store-based and contract-based operations
- Handles request/response validation with Pydantic models
- Located at the root level (
api/) as a separate application
- All endpoints are async-ready for better performance
Example:
pycharter-api
uvicorn api.main:app --reload
Endpoints:
POST /api/v1/contracts/parse - Parse a data contract
POST /api/v1/contracts/build - Build contract from store
POST /api/v1/metadata/schemas - Store a schema
GET /api/v1/metadata/schemas/{schema_id} - Get a schema
POST /api/v1/schemas/generate - Generate Pydantic model
POST /api/v1/validation/validate - Validate data
POST /api/v1/validation/validate-batch - Batch validation
Documentation:
See api/README.md for complete API documentation.
Service Integration Summary
| Contract Parser | Contract files (YAML/JSON) | ContractMetadata | Contract Specification β Parsing |
| Contract Builder | Separate artifacts or Store | Consolidated contract | Storage β Consolidation |
| Metadata Store | ContractMetadata | Stored metadata (DB) | Parsing β Storage |
| Pydantic Generator | JSON Schema | Pydantic models | Storage β Model Generation |
| JSON Schema Converter | Pydantic models | JSON Schema | (Bidirectional) |
| Runtime Validator | Pydantic models + Data | ValidationResult | Model Generation β Validation |
Each service is designed to be independent yet composable, allowing you to use them individually or together as part of a complete data contract management system.
π Documentation
- Data Journey Guide - Complete guide to the data production journey, including both combined and separated workflows
- Configuration Guide - Database setup, connection configuration, initialization, and migration commands
- Database ERD - Database schema documentation and entity relationship diagrams
- Examples - Complete working examples for all PyCharter services
- API Documentation - REST API endpoints and usage
π Usage Examples
Basic Usage
from pycharter import from_dict, from_json, from_file
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"published": {"type": "boolean", "default": False}
}
}
Article = from_dict(schema, "Article")
schema_json = '{"type": "object", "properties": {"name": {"type": "string"}}}'
User = from_json(schema_json, "User")
Product = from_file("product_schema.json", "Product")
Nested Objects
from pycharter import from_dict
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"address": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"zipcode": {"type": "string"}
}
}
}
}
Person = from_dict(schema, "Person")
person = Person(
name="Alice",
address={
"street": "123 Main St",
"city": "New York",
"zipcode": "10001"
}
)
print(person.address.city)
Arrays and Collections
from pycharter import from_dict
schema = {
"type": "object",
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"}
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
}
}
}
Cart = from_dict(schema, "Cart")
cart = Cart(
tags=["python", "pydantic"],
items=[
{"name": "Apple", "price": 1.50},
{"name": "Banana", "price": 0.75}
]
)
print(cart.items[0].name)
Coercion and Validation
Charter supports coercion (pre-validation transformation) and validation (post-validation checks):
from pycharter import from_dict
schema = {
"type": "object",
"properties": {
"flight_number": {
"type": "integer",
"coercion": "coerce_to_integer"
},
"destination": {
"type": "string",
"coercion": "coerce_to_string",
"validations": {
"min_length": {"threshold": 3},
"max_length": {"threshold": 3},
"no_capital_characters": None,
"only_allow": {"allowed_values": ["abc", "def", "ghi"]}
}
},
"distance": {
"type": "number",
"coercion": "coerce_to_float",
"validations": {
"greater_than_or_equal_to": {"threshold": 0}
}
}
}
}
Flight = from_dict(schema, "Flight")
flight = Flight(
flight_number="123",
destination="abc",
distance="100.5"
)
π Standard JSON Schema Support
Charter supports all standard JSON Schema Draft 2020-12 validation keywords:
minLength | string | Minimum string length | {"minLength": 3} |
maxLength | string | Maximum string length | {"maxLength": 10} |
pattern | string | Regular expression pattern | {"pattern": "^[a-z]+$"} |
enum | any | Allowed values | {"enum": ["a", "b", "c"]} |
const | any | Single allowed value | {"const": "fixed"} |
minimum | number | Minimum value (inclusive) | {"minimum": 0} |
maximum | number | Maximum value (inclusive) | {"maximum": 100} |
exclusiveMinimum | number | Minimum value (exclusive) | {"exclusiveMinimum": 0} |
exclusiveMaximum | number | Maximum value (exclusive) | {"exclusiveMaximum": 100} |
multipleOf | number | Must be multiple of | {"multipleOf": 2} |
minItems | array | Minimum array length | {"minItems": 1} |
maxItems | array | Maximum array length | {"maxItems": 10} |
uniqueItems | array | Array items must be unique | {"uniqueItems": true} |
All schemas are validated against JSON Schema standard before processing, ensuring compliance.
π§ Built-in Coercions (Charter Extensions)
coerce_to_string | Convert int, float, bool, datetime, dict, list to string |
coerce_to_integer | Convert float, string (numeric), bool, datetime to int |
coerce_to_float | Convert int, string (numeric), bool to float |
coerce_to_boolean | Convert int, string to bool |
coerce_to_datetime | Convert string (ISO format), timestamp to datetime |
coerce_to_date | Convert string (date format), datetime to date (date only, no time) |
coerce_to_uuid | Convert string to UUID |
coerce_to_lowercase | Convert string to lowercase |
coerce_to_uppercase | Convert string to uppercase |
coerce_to_stripped_string | Strip leading and trailing whitespace from string |
coerce_to_list | Convert single value to list [value] (preserves None) |
coerce_empty_to_null | Convert empty strings/lists/dicts to None (useful for nullable fields) |
β
Built-in Validations (Charter Extensions)
min_length | Minimum length for strings/arrays | {"threshold": N} |
max_length | Maximum length for strings/arrays | {"threshold": N} |
only_allow | Only allow specific values | {"allowed_values": [...]} |
greater_than_or_equal_to | Numeric minimum | {"threshold": N} |
less_than_or_equal_to | Numeric maximum | {"threshold": N} |
is_positive | Value must be positive | {"threshold": 0} |
no_capital_characters | No uppercase letters | null |
no_special_characters | Only alphanumeric and spaces | null |
non_empty_string | String must not be empty | null |
matches_regex | String must match regex pattern | {"pattern": "..."} |
is_email | String must be valid email address | null |
is_url | String must be valid URL | null |
is_alphanumeric | Only alphanumeric characters (no spaces/special) | null |
is_numeric_string | String must be numeric (digits, optional decimal) | null |
is_unique | All items in array must be unique | null |
Note: Charter extensions (coercion and validations) are optional and can be used alongside standard JSON Schema keywords. All validation logic is stored as data in the JSON schema, making it fully data-driven.
π¨ Custom Coercions and Validations
Extend Charter with your own coercion and validation functions:
from pycharter.shared.coercions import register_coercion
from pycharter.shared.validations import register_validation
def coerce_to_uppercase(data):
if isinstance(data, str):
return data.upper()
return data
register_coercion("coerce_to_uppercase", coerce_to_uppercase)
def must_be_positive(threshold=0):
def _validate(value, info):
if value <= threshold:
raise ValueError(f"Value must be > {threshold}")
return value
return _validate
register_validation("must_be_positive", must_be_positive)
π API Reference
Main Functions
from_dict(schema: dict, model_name: str = "DynamicModel") - Create model from dictionary
from_json(json_string: str, model_name: str = "DynamicModel") - Create model from JSON string
from_file(file_path: str, model_name: str = None) - Create model from JSON file
from_url(url: str, model_name: str = "DynamicModel") - Create model from URL
schema_to_model(schema: dict, model_name: str = "DynamicModel") - Low-level model generator
π― Design Principles & Requirements
Charter is designed to meet the following core requirements:
β
JSON Schema Standard Compliance
All schemas must abide by conventional JSON Schema syntax and qualify as valid JSON Schema:
- Validation: All schemas are validated against JSON Schema Draft 2020-12 standard before processing
- Standard Keywords: Full support for all standard validation keywords (minLength, pattern, enum, minimum, maximum, etc.)
- Compliance: Uses
jsonschema library for validation with graceful fallback
β
Data-Driven Validation Logic
All schema information and complex field validation logic is stored as data, not Python code:
- Coercion: Referenced by name (string) in JSON:
"coercion": "coerce_to_integer"
- Validations: Referenced by name with configuration (dict) in JSON:
"validations": {"min_length": {"threshold": 3}}
- No Code Required: Validation rules are defined entirely in JSON schema files
- Example:
{"coercion": "coerce_to_string", "validations": {"min_length": {"threshold": 3}}}
β
Dynamic Pydantic Model Generation
Models are created dynamically at runtime from JSON schemas:
- Runtime Generation: Uses
pydantic.create_model() to generate models on-the-fly
- Dynamic Validators: Field validators are dynamically attached using
field_validator decorators
- Multiple Sources: Models can be created from dicts, JSON strings, files, or URLs
- No Static Code: All models are generated from data, not pre-defined classes
β
Nested Schema Support
Full support for nested object schemas and complex structures:
- Recursive Processing: Nested objects are recursively processed into their own Pydantic models
- Arrays of Objects: Arrays containing nested objects are fully supported
- Deep Nesting: Deeply nested structures work correctly with full type safety
- Type Safety: Each nested object becomes its own typed Pydantic model
β
Extension Fields
Custom fields can be added to JSON Schema to extend functionality:
coercion: Pre-validation type conversion (e.g., string β integer)
validations: Post-validation custom rules
- Optional: Extensions work alongside standard JSON Schema keywords
- Separated: Extensions are clearly distinguished from standard JSON Schema
β
Complex Field Validation
Support for both standard and custom field validators:
- Standard Validators: minLength, pattern, enum, minimum, maximum, etc. (JSON Schema standard)
- Custom Validators: Extensible validation rules via
validations field
- Validation Order: Coercion β Standard Validation β Pydantic Validation β Custom Validations
- Factory Pattern: Validators are factory functions that return validation functions
π Development Setup
Quick Setup
./setup.sh
source venv/bin/activate
pytest
Using Make
make install-dev
make test
make format
make lint
make check
π§ͺ Testing
pytest
pytest --cov=pycharter --cov-report=html
pytest tests/test_converter.py
pytest -k "coercion"
π¦ Publishing to PyPI
make clean
make build
make publish-test
make publish
π JSON Schema Compliance
PyCharter is fully compliant with JSON Schema Draft 2020-12 standard:
- All schemas are validated against the standard before processing
- Full support for all standard keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.)
- Optional extensions (
coercion and validations) work alongside standard keywords
- Strict mode available to enforce standard-only schemas
π Requirements
- Python 3.10+
- Pydantic >= 2.0.0
- jsonschema >= 4.0.0 (optional, for enhanced validation)
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature)
- Commit your changes (
git commit -m 'Add some amazing feature')
- Push to the branch (
git push origin feature/amazing-feature)
- Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Links
Made with β€οΈ for the Python community