Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
This library allow you to declare validation tasks to check for CSV files. This ensure data correctness for ETL pipeline that update frequently.
pip install datavalid
Create a datavalid.yml
file in your data folder:
files:
fuse/complaint.csv:
schema:
uid:
description: >
accused officer's unique identifier. This references the `uid` column in personnel.csv
tracking_number:
description: >
complaint tracking number from the agency the data originate from
complaint_uid:
description: >
complaint unique identifier
unique: true
no_na: true
validation_tasks:
- name: "`complaint_uid`, `allegation` and `uid` should be unique together"
unique:
- complaint_uid
- uid
- allegation
- name: if `allegation_finding` is "sustained" then `disposition` should also be "sustained"
empty:
and:
- column: allegation_finding
op: equal
value: sustained
- column: disposition
op: not_equal
value: sustained
fuse/event.csv:
schema:
event_uid:
description: >
unique identifier for each event
unique: true
no_na: true
kind:
options:
- officer_level_1_cert
- officer_pc_12_qualification
- officer_rank
validation_tasks:
- name: no officer with more than 1 left date in a calendar month
where:
column: kind
op: equal
value: officer_left
group_by: uid
no_more_than_once_per_30_days:
date_from:
year_column: year
month_column: month
day_column: day
save_bad_rows_to: invalid_rows.csv
Then run datavalid command in that folder:
python -m datavalid
You can also specify a data folder that isn't the current working directory:
python -m datavalid --dir my_data_folder
A config file is a file named datavalid.yml
and it must be placed in your root data folder. Your root data folder is the folder that contain all of your data files. Config file contains config object in YAML format.
float: true
.Common fields:
Checker fields (define exactly one of these fields):
There are 3 ways to define a condition. The first way is to provide column
, op
and value
:
The second way is to provide and
field:
Finally the last way is to provide or
field:
and
except that the sub-conditions are or-ed together which mean the condition is fulfilled if any of the sub-conditions is fulfilled.Combines multiple columns to create dates.
FAQs
Data validation library
We found that datavalid demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.