Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Readme
The fuzzydata
workflow generator enables:
Fuzzydata is currently designed to run using the following clients:
fuzzydata
is designed to be extensible, you may implement your own client.
Please see the existing clients in fuzzydata/clients for ways to extend the abstract Artifact
, Operation
and Workflow
classes for your client.
Manual build/install using pip.
pip install fuzzydata
fuzzydata
Does not install modin
or SQLAlchemy
by default, but this can be specified as an install option:
pip install fuzzydata[modin|sql|all]
Some examples of fuzzydata usage are in the examples
directory. You can also run the fuzzydata
command
to get a list of command-line options supported in fuzzydata
$ fuzzydata --help
usage: fuzzydata [-h] [--wf_client WF_CLIENT] [--output_dir OUTPUT_DIR] [--wf_name WF_NAME]
[--columns COLUMNS] [--rows ROWS] [--versions VERSIONS] [--bfactor BFACTOR]
[--matfreq MATFREQ] [--npp NPP] [--log LOG] [--replay_dir REPLAY_DIR]
[--wf_options WF_OPTIONS] [--exclude_ops EXCLUDE_OPS] [--scale_artifact SCALE_ARTIFACT]
optional arguments:
-h, --help show this help message and exit
--wf_client WF_CLIENT
Workflow Client to be used (Default pandas). Available Workflows: pandas|modin|sql
--output_dir OUTPUT_DIR
Location of Output datasets to be stored
--wf_name WF_NAME prefix for each workflow to be generated dir to be the path prefix for these files.
--columns COLUMNS Number of columns in the base version
--rows ROWS Number of rows in the base version
--versions VERSIONS Number of artifact versions to generate
--bfactor BFACTOR Workflow Branching factor, 0.1 is linear, 100 is star-like
--matfreq MATFREQ Materialization frequency, i.e. how many operations before writing out an artifact
--log LOG Set Logging Level
--replay_dir REPLAY_DIR
Replay existing workflow in directory
--wf_options WF_OPTIONS
JSON-encoded workflow engine options like sql_string or modin_engine
--exclude_ops EXCLUDE_OPS
JSON-encoded list of ops to exclude e.g. ["pivot"]
--scale_artifact SCALE_ARTIFACT
JSON-encoded dict of {artifact_label: new_size} to be scaled up e.g. {"artifact_0"
: 1000000}
Download our paper here.
If you use fuzzydata in your research, please consider citing our paper:
@inproceedings{10.1145/3531348.3532178,
author = {Rehman, Mohammed Suhail and Elmore, Aaron},
title = {FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems},
year = {2022},
isbn = {9781450393539},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3531348.3532178},
doi = {10.1145/3531348.3532178},
booktitle = {Proceedings of the 2022 Workshop on 9th International Workshop of Testing Database Systems},
pages = {17–24},
numpages = {8},
location = {Philadelphia, PA, USA},
series = {DBTest '22}
}
Check out the current roadmap in docs/roadmap.md. You are always welcome to develop a new client for fuzzydata.
FAQs
Fuzzy Data Benchmark
We found that fuzzydata demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.