Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
DeepEcho is a Synthetic Data Generation Python library for mixed-type, multivariate time series. It provides:
model
and
sample
API and get evaluated.Important Links | |
---|---|
:computer: Website | Check out the SDV Website for more information about the project. |
:orange_book: SDV Blog | Regular publshing of useful content about Synthetic Data Generation. |
:book: Documentation | Quickstarts, User and Development Guides, and API Reference. |
:octocat: Repository | The link to the Github Repository of this library. |
:keyboard: Development Status | This software is in its Pre-Alpha stage. |
Community | Join our Slack Workspace for announcements and discussions. |
Tutorials | Run the SDV Tutorials in a Binder environment. |
DeepEcho is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide
Optionally, DeepEcho can also be installed as a standalone library using the following commands:
Using pip
:
pip install deepecho
Using conda
:
conda install -c pytorch -c conda-forge deepecho
For more installation options please visit the DeepEcho installation Guide
DeepEcho is included as part of SDV to model and sample synthetic time series. In most cases, usage through SDV is recommeded, since it provides additional functionalities which are not available here. For more details about how to use DeepEcho whithin SDV, please visit the corresponding User Guide:
DeepEcho can also be used as a standalone library.
In this short quickstart, we show how to learn a mixed-type multivariate time series dataset and then generate synthetic data that resembles it.
We will start by loading the data and preparing the instance of our model.
from deepecho import PARModel
from deepecho.demo import load_demo
# Load demo data
data = load_demo()
# Define data types for all the columns
data_types = {
'region': 'categorical',
'day_of_week': 'categorical',
'total_sales': 'continuous',
'nb_customers': 'count',
}
model = PARModel(cuda=False)
If we want to use different settings for our model, like increasing the number of epochs or enabling CUDA, we can pass the arguments when creating the model:
model = PARModel(epochs=1024, cuda=True)
Notice that for smaller datasets like the one used on this demo, CUDA usage introduces more overhead than the gains it obtains from parallelization, so the process in this case is more efficient without CUDA, even if it is available.
Once we have created our instance, we are ready to learn the data and generate new synthetic data that resembles it:
# Learn a model from the data
model.fit(
data=data,
entity_columns=['store_id'],
context_columns=['region'],
data_types=data_types,
sequence_index='date'
)
# Sample new data
model.sample(num_entities=5)
The output will be a table with synthetic time series data with the same properties to the demo data that we used as input.
For more details about DeepEcho and all its possibilities and features, please check and run the tutorials.
If you want to see how we evaluate the performance and quality of our models, please have a look at the SDGym Benchmarking framework.
Also, please feel welcome to visit our contributing guide in order to help us developing new features or cool ideas!
The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:
Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.
FAQs
Create sequential synthetic data of mixed types using a GAN.
We found that deepecho demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 7 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.