You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

px-processor

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

px-processor

Process and validate JSON and CSV data with ease.

0.2.0

PyPI

Maintainers: 1

Data Validator Framework

A Python-based data validation framework for CSV and JSON data. This project provides a unified approach to validate data formats and contents using specialised validators built on a common foundation. The framework is easily extendable and leverages industry-standard libraries such as Pandas, Polars, and Pydantic.

Overview

The Data Validator Framework is designed to simplify and standardise data validation tasks. It consists of:

CSV Validator: Validates CSV files using either the Pandas or Polars engine. It checks for issues such as missing data, incorrect data types, invalid date formats, fixed column values, and duplicate entries.
JSON Validator: Validates JSON objects against Pydantic models, ensuring the data conforms to the expected schema while providing detailed error messages.
Common Validator Base: An abstract BaseValidator class that defines a standard interface and error management for all validators.
Custom Errors: A set of custom error classes that offer precise and informative error reporting, helping to identify and resolve data issues efficiently.

Features

CSV Validation
- Supports both Pandas and Polars engines.
- Reads multiple CSV files concurrently.
- Validates data types, missing data, date formats, fixed values, and uniqueness constraints.
JSON Validation
- Uses Pydantic for schema validation.
- Automatically converts JSON keys to strings to ensure compatibility.
- Aggregates and formats error messages for clarity.
Extensible Architecture
- A unified abstract base class (BaseValidator) that standardises validation methods.
- Customisable error handling with detailed messages.

Requirements

Python: 3.10 or above.
Dependencies:
- For CSV validation:
  - pandas
  - polars (if using Polars)
  - pyarrow (if using the PyArrow engine with Pandas)
- For JSON validation:
  - Pydantic
  - pydantic-core

Installation

Clone the Repository:
```
pip install px_processor
poetry add px_processor
uv add px_processor
```

Usage

CSV Validation Example

from processor import CSVValidator

validator = CSVValidator(
    csv_paths=["data/file1.csv", "data/file2.csv"],
    data_types=["str", "int", "float"],
    column_names=["id", "name", "value"],
    unique_value_columns=["id"],
    columns_with_no_missing_data=["name"],
    missing_data_column_mapping={"value": ["NaN", "None"]},
    valid_column_values={"name": ["Alice", "Bob", "Charlie"]},
    drop_columns=["unused_column"],
    strict_validation=True,
)

validator.validate()

JSON Validation Example

from pydantic import BaseModel
from processor import JSONValidator

class UserModel(BaseModel):
    id: int
    name: str
    email: str

json_data = {
    "id": 123,
    "name": "Alice",
    "email": "alice@example.com"
}

validator = JSONValidator(model=UserModel, input_=json_data)
validator.validate()

Project Structure

validator/
├── .gitignore
├── .python-version
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── pyproject.toml
├── uv.lock
├── requirements.txt
├── README.md
├── src/
    └── validator/
        └── config/
        │   ├── __init.py__
        │   └── csv_.py           # Configuration settings for CSV validation.
        ├── __init.py__
        ├── base.py               # Abstract base class for validators.
        ├── errors.py             # Custom error classes for validation.
        ├── README.md
        ├── csv/
        │   ├── __init.py__
        │   ├── README.md
        │   └── main.py           # CSV validation implementation.
        └── json/
        │   ├── __init.py__
        │   ├── README.md
        │   └── main.py          # JSON validation implementation.
        └── tests/
            ├── __init.py__
            ├── config.py           # Test configuration settings.
            |── integration/
            |   ├── __init.py__
            |   ├── test_integration_json.py   # Integration tests for JSON validation.
            |   └── test_integration_csv.py    # Integration tests for CSV validation.
            ├── unit/
            |   ├── __init.py__
            |   ├── test_csv.py     # Unit tests for CSV validation.
            |   └── test_json.py    # Unit tests for JSON validation.
            ├── csvs/               # CSV files for testing.
            └── jsons/              # JSON files for testing.

Contributing

Contributions are welcome! Please adhere to standard code review practices and ensure your contributions are well tested and documented.

Licence

This project is licensed under the MIT License. See the LICENSE file for details.

For developers

To generate requirements.txt

uv export --format requirements.txt --no-emit-project --no-emit-workspace --no-annotate --no-header --no-hashes --no-editable -o requirements.txt

To generate CHANGELOG.md

uv run git-cliff -o CHANGELOG.md

To bump version.

 uv run bump-my-version show-bump

Keywords

FAQs

What is px-processor?

Is px-processor well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

px-processor

Data Validator Framework

Overview

Features

Requirements

Installation

Usage

CSV Validation Example

JSON Validation Example

Project Structure

Contributing

Licence

For developers

Keywords

Related posts

Toptal’s GitHub Organization Hijacked: 10 Malicious Packages Published

Surveillance Malware Hidden in npm and PyPI Packages Targets Developers with Keyloggers, Webcam Capture, and Credential Theft