
Security News
The Hidden Blast Radius of the Axios Compromise
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.
synth-data-eval
Advanced tools
A collaborative research project investigating methods for generating and evaluating synthetic tabular data across multiple domains. This repository contains reproducible code, datasets, and experiment configurations used in our paper preparation.
Synthetic data is crucial for privacy-preserving machine learning. This project evaluates different synthetic data generators (CTGAN, TVAE, Gaussian Copula, KAN-CTGAN, KAN-TVAE) across statistical fidelity, ML utility, privacy, and data quality.
Research Objective: To provide a systematic benchmark framework and identify trade-offs between realism, privacy, and downstream task performance.
pip install synth-data-eval
git clone https://github.com/ahmed-fouad-lagha/synth-data-eval.git
cd synth-data-eval
pip install -e ".[all]" # Install with all optional dependencies
pip install -e ".[dev]" # Development tools (pytest, mypy, black, etc.)
pip install -e ".[docs]" # Documentation building
pip install -e ".[notebooks]" # Jupyter notebook support
python scripts/download_low_resource_datasets.py
synthetic-tabular-eval/
├── pyproject.toml
├── README.md
├── CONTRIBUTING.md
├── LICENSE
├── .gitignore
├── generators/
│ ├── __init__.py
│ ├── base_generator.py
│ ├── ctgan_model.py
│ ├── tvae_model.py
│ ├── gaussian_copula.py
│ ├── kan_ctgan_model.py
│ ├── kan_tvae_model.py
│ ├── KAN_code.py
│ ├── KAN_CTGAN_code.py
│ └── KAN_TVAE_code.py
├── evaluation/
│ ├── __init__.py
│ ├── sdmetrics_evaluation.py
│ ├── ml_utility.py
│ └── privacy_metrics.py
├── scripts/
│ ├── config.yaml
│ ├── run_benchmark.py
│ ├── visualize_results.py
│ └── download_datasets.py
├── tests/
│ ├── __init__.py
│ ├── test_generators.py
│ └── test_evaluation.py
├── datasets/
├── results/
└── logs/
We evaluated on five benchmark datasets for comprehensive evaluation:
The project now includes Kolmogorov-Arnold Networks (KAN) enhanced versions of CTGAN and TVAE generators. KANs use learnable spline-based activation functions instead of traditional MLPs, potentially offering better expressiveness and performance on complex tabular data distributions.
KAN Features:
Performance Highlights:
Trade-offs Identified:
Completed Research Workflow:
Key Scripts:
scripts/run_benchmark.py - Execute complete experimental pipelinescripts/statistical_analysis.py - Generate significance tests and LaTeX tablesscripts/visualize_results.py - Create radar plots, heatmaps, and utility comparisonspaper/main.tex - Complete research paper with results and analysis# 1. Install dependencies
pip install -e ".[all]"
# 2. Download datasets
python scripts/download_datasets.py
# 3. Run complete benchmark (will take several hours)
python scripts/run_benchmark.py
# 4. Generate statistical analysis
python scripts/statistical_analysis.py
# 5. Create visualizations
python scripts/visualize_results.py
# 6. Compile paper
cd paper && pdflatex main.tex
# Clone the repository
git clone https://github.com/ahmed-fouad-lagha/synth-data-eval.git
cd synth-data-eval
# Install in development mode with all dependencies
pip install -e ".[dev,docs,notebooks]"
# Optional: Install pre-commit hooks for code quality
pip install pre-commit
pre-commit install
# Run all tests
pytest
# Run with coverage
pytest --cov=generators --cov=evaluation
# Run specific test file
pytest tests/test_generators.py
# Format code
black .
isort .
# Lint code
flake8 .
# Type check
mypy generators/ evaluation/ scripts/
# Build documentation
cd docs
sphinx-build -b html . _build/html
# View documentation
open _build/html/index.html
This project uses GitHub Actions for continuous integration:
Use the provided release script for consistent versioning and publishing:
# Patch release (0.1.0 -> 0.1.1)
python scripts/make_release.py patch
# Minor release (0.1.0 -> 0.2.0)
python scripts/make_release.py minor
# Major release (0.1.0 -> 1.0.0)
python scripts/make_release.py major
# Specific version release
python scripts/make_release.py v1.0.0
The script will:
pyproject.tomlCHANGELOG.md with release dateIf you prefer manual control:
pyproject.tomlCHANGELOG.mdgit commit -m "Release v1.0.0"git tag -a v1.0.0 -m "Release v1.0.0"git push origin v1.0.0You can test releases on TestPyPI before publishing to production:
pip install --index-url https://test.pypi.org/simple/ synth-data-evalIf your repository is private, GitHub release creation requires a Personal Access Token (PAT):
Create a Personal Access Token (PAT):
repo scopeAdd to Repository Secrets:
RELEASE_TOKEN✅ Status: RELEASE_TOKEN is now configured - GitHub releases will work automatically!
This repository is currently private and contains research code under development. It will be made public upon publication of the associated research paper to ensure proper attribution and compliance with venue policies.
This repository contains research code that will be made publicly available under the MIT License upon publication of the associated research paper.
For pre-publication access, please contact the authors.
FAQs
Comprehensive evaluation framework for tabular synthetic data generators
We found that synth-data-eval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.

Research
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.