🚨 Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis →
Socket
Book a DemoInstallSign in
Socket

sentri

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sentri

Sentri - a production-ready, configurable data quality validation framework

pipPyPI
Version
1.0.0
Maintainers
1

Sentri

Sentri is a production-ready, configurable data quality validation framework with 10 check types, multiple data connectors, and flexible output formats.

Features

  • 10 Check Types: Completeness, Uniqueness, Range, Turnover, Value Spike, Frequency, Correlation, Statistical, Distribution, Drift
  • Multiple Connectors: CSV, Oracle, Snowflake (extensible)
  • Flexible Configuration: YAML with environment variable support
  • Multiple Output Formats: JSON, HTML, CSV, DataFrame
  • Threshold-Based Validation: Critical, Warning, Pass, Error states
  • Comprehensive Logging: Text and JSON formats

Installation

pip install sentri

# Optional: development extras when working on the project itself
pip install -e ".[dev]"  # Development install
pip install -e ".[all]"  # With all database connectors

Quick Start

Programmatic Usage

from sentri import DataQualityFramework
from sentri.checks import CompletenessCheck, TurnoverCheck
from sentri.connectors import OracleConnector, SnowflakeConnector

# Example: using DataQualityFramework with a config file
framework = DataQualityFramework(config_path="config.yaml")
results = framework.run_checks(start_date="2025-01-01", end_date="2025-01-31")

Configuration File Usage

# config.yaml
source:
  type: csv
  csv:
    file_path: /data/sample.csv
    date_column: effective_date

metadata:
  dq_check_name: "Sample Check"
  date_column: effective_date
  id_column: entity_id

checks:
  completeness:
    value:
      thresholds:
        absolute_critical: 0.05
        absolute_warning: 0.02
      description: "Value completeness"

output:
  formats: [json, html, csv]
  destination: /output

Check Types

Check TypeDescription
CompletenessMonitor null/missing values
UniquenessDetect duplicate values
RangeValidate value bounds
TurnoverTrack ID additions/removals
Value SpikeDetect abnormal value changes
FrequencyMonitor category distributions
StatisticalTrack mean, std, median, etc.
CorrelationValidate temporal/cross-column correlation
DistributionDetect distribution shifts (KS test)
DriftIdentify gradual drift (PSI)

Project Structure

dq_framework/
├── src/data_quality/
│   ├── checks/           # Check implementations
│   ├── connectors/       # Data connectors
│   ├── core/             # Exceptions, config, framework
│   ├── formatters/       # Output formatters
│   ├── managers/         # Check manager
│   └── utils/            # Logger, constants
├── tests/                # Unit tests
├── examples/             # Sample configs and scripts
└── pyproject.toml

Running Tests

pytest tests/unit/ -v --cov

Thresholds

Each check supports:

  • absolute_critical: Fails if exceeded
  • absolute_warning: Warns if exceeded
  • delta_critical/delta_warning: For change-based thresholds

Output

Results include:

  • Summary: Total, passed, warnings, failed, pass rate
  • Details: Check type, column, date, metric value, status

License

MIT

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts