Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
The SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example quality and privacy. It also includes reports that you can run to generate insights, visualize data and share with your team.
The SDMetrics library is model-agnostic, meaning you can use any synthetic data. The library does not need to know how you created the data.
Install SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.
pip install sdmetrics
conda install -c conda-forge sdmetrics
For more information about using SDMetrics, visit the SDMetrics Documentation.
Get started with SDMetrics Reports using some demo data,
from sdmetrics import load_demo
from sdmetrics.reports.single_table import QualityReport
real_data, synthetic_data, metadata = load_demo(modality='single_table')
my_report = QualityReport()
my_report.generate(real_data, synthetic_data, metadata)
Creating report: 100%|██████████| 4/4 [00:00<00:00, 5.22it/s]
Overall Quality Score: 82.84%
Properties:
Column Shapes: 82.78%
Column Pair Trends: 82.9%
Once you generate the report, you can drill down on the details and visualize the results.
my_report.get_visualization(property_name='Column Pair Trends')
Save the report and share it with your team.
my_report.save(filepath='demo_data_quality_report.pkl')
# load it at any point in the future
my_report = QualityReport.load(filepath='demo_data_quality_report.pkl')
Want more metrics? You can also manually apply any of the metrics in this library to your data.
# calculate whether the synthetic data respects the min/max bounds
# set by the real data
from sdmetrics.single_column import BoundaryAdherence
BoundaryAdherence.compute(
real_data['start_date'],
synthetic_data['start_date']
)
0.8503937007874016
# calculate whether the synthetic data is new or whether it's an exact copy of the real data
from sdmetrics.single_table import NewRowSynthesis
NewRowSynthesis.compute(
real_data,
synthetic_data,
metadata
)
1.0
To learn more about the reports and metrics, visit the SDMetrics Documentation.
The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:
Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.
FAQs
Metrics for Synthetic Data Generation Projects
We found that sdmetrics demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 10 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.