![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
A library for data quality validation using PyDeequ and to send email notification.
Data-Checkmate, a Data Quality Validation
This package is designed for performing data quality validation using PyDeequ.
It enables users to validate the quality of their data, identifying any potential issues that may affect its suitability for processing or analysis. Also, to send email notification about the validation result.
Author: Ketan Kirange
Contributors: Ketan Kirange, Ajay Rahul Raja
This package contains tools and utilities for performing data quality checks on data files in
These checks help ensure the integrity, accuracy, and completeness of the data, essential for robust data-driven decision-making processes.
Importance of Data Quality
Data quality plays a pivotal role in any engineering project, especially in data science, reporting, and analysis.
Here's why ensuring high data quality is crucial:
1. Reliable Insights
High-quality data leads to reliable and trustworthy insights.
When the data is accurate, complete, and consistent, data scientists and analysts can make informed decisions confidently.
2. Trustworthy Models
Data quality directly impacts the performance and reliability of machine learning models.
Models trained on low-quality data may produce biased or inaccurate predictions, leading to unreliable outcomes.
3. Effective Reporting
Quality data is fundamental for generating accurate reports and visualizations.
Analysts and stakeholders rely on these reports for understanding trends, identifying patterns, and making strategic decisions.
Poor data quality can lead to misleading reports and flawed interpretations.
4. Regulatory Compliance
In many industries, compliance with regulations such as GDPR, HIPAA, or industry-specific standards is mandatory.
Ensuring data quality is essential for meeting these regulatory requirements and avoiding potential legal consequences.
Data Quality Validation Tools
This repository provides a set of tools and utilities to perform comprehensive data quality validation on various data formats:
To get started with Data-Checkmate for validation, please refer these below notebooks:
Prerequisites:
Java: Java 1.8 Archive Downloads
Python: Python 3.9.18 Release
Apache Spark: Apache Spark 3.3.0 Release
How to install PyDeequ? Use the following command:
pip install pydeequ
Step 3: Install our ‘Data Quality Validation’ python library in terminal.
pip install Data-Checkmate
Step 4: To run the Data Quality Validation function, import the library as below:
from dqv.datacheckmate import DqvPydeequ, sendEmailNotification
Step 5: Create a config file in a folder with the columns that need to be validated.
Name the file as you wish, but remember to use the name in the DqvPydeequ function.
Step 6: Upload your data to S3 and save it in a new directory if you are running locally.
Step 7: Pass your source and target file paths in the DqvPydeequ function.
DqvPydeequ(
"", #config_file
"", #source_data_path
"") #target_data_path
Step 8: Run the file to validate.
Step 9: After validating the data, the result can be sent via email.
For this, import the library as below:
from email_notification import sendEmailNotification
Step 10: save your aws administration imputs, sender email, and region as dictionary. Then pass source path, target path and this config in the sendEmailNotification function.
email_config = {
"aws_access_key_id": "", #aws administration
"aws_secret_access_key": "", #aws administration
"aws_session_token": "", #aws administration
"sender_email": "", #sender email
"receiver_email": "", #receiver email
"aws_region": "" #region
}
send_notification = sendEmailNotification(
"", #source_data_path
"", #target_data_path
email_config #dictionary
)
Refer this repo to follow the structure of config file format
Git: https://github.com/dataruk/data-quality-validation
FAQs
A library for data quality validation using PyDeequ and to send email notification.
We found that Data-Checkmate demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.