
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
The goal of this project is to provide tools for working with large network traffic datasets and to facilitate research in the traffic classification area. The core functions of the cesnet-datazoo
package are:
S
size containing 25 million samples.:brain: :brain: See a related project CESNET Models providing pre-trained neural networks for traffic classification. :brain: :brain:
:notebook: :notebook: Example Jupyter notebooks are included in a separate CESNET Traffic Classification Examples repo. :notebook: :notebook:
The cesnet-datazoo
package currently provides three datasets with details in the following table (you might need to scroll the table horizontally to see all datasets).
Name | CESNET-TLS22 | CESNET-QUIC22 | CESNET-TLS-Year22 |
---|---|---|---|
Protocol | TLS | QUIC | TLS |
Published in | 2022 | 2023 | 2023 |
Collection duration | 2 weeks | 4 weeks | 1 year |
Collection period | 4.10.2021 - 17.10.2021 | 31.10.2022 - 27.11.2022 | 1.1.2022 - 31.12.2022 |
Application count | 191 | 102 | 180 |
Available samples | 141392195 | 153226273 | 507739073 |
Available dataset sizes | XS, S, M, L | XS, S, M, L | XS, S, M, L |
Cite | https://doi.org/10.1016/j.comnet.2022.109467 | https://doi.org/10.1016/j.dib.2023.108888 | https://doi.org/10.1038/s41597-024-03927-4 |
Zenodo URL | https://zenodo.org/record/7965515 | https://zenodo.org/record/7963302 | https://zenodo.org/records/10608607 |
Related papers | https://doi.org/10.23919/TMA58422.2023.10199052 |
Install the package from pip with:
pip install cesnet-datazoo
or for editable install with:
pip install -e git+https://github.com/CESNET/cesnet-datazoo
from cesnet_datazoo.datasets import CESNET_QUIC22
from cesnet_datazoo.config import DatasetConfig, AppSelection
dataset = CESNET_QUIC22("/datasets/CESNET-QUIC22/", size="XS")
dataset_config = DatasetConfig(
dataset=dataset,
apps_selection=AppSelection.ALL_KNOWN,
train_period_name="W-2022-44",
test_period_name="W-2022-45",
)
dataset.set_dataset_config_and_initialize(dataset_config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()
The DatasetConfig
class handles the configuration of datasets, and calling set_dataset_config_and_initialize
initializes train, validation, and test sets with the desired configuration.
Data can be read into Pandas DataFrames as shown here or via PyTorch DataLoaders. See CesnetDataset
reference.
See more examples in the documentation.
This project was supported by the Ministry of the Interior of the Czech Republic, grant No. VJ02010024: Flow-Based Encrypted Traffic Analysis.
FAQs
A toolkit for large network traffic datasets
We found that cesnet-datazoo demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.