Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
What is TorchData? | Stateful DataLoader | Install guide | Contributing | License
:warning: June 2024 Status Update: Removing DataPipes and DataLoader V2
We are re-focusing the torchdata repo to be an iterative enhancement of torch.utils.data.DataLoader. We do not plan on
continuing development or maintaining the [DataPipes
] and [DataLoaderV2
] solutions, and they will be removed from
the torchdata repo. We'll also be revisiting the DataPipes
references in pytorch/pytorch. In release
torchdata==0.8.0
(July 2024) they will be marked as deprecated, and sometime after 0.9.0 (Oct 2024) they will be
deleted. Existing users are advised to pin to torchdata==0.9.0
or an older version until they are able to migrate
away. Subsequent releases will not include DataPipes or DataLoaderV2. The old version of this README is
available here. Please reach out if you suggestions or comments
(please use #1196 for feedback).
The TorchData project is an iterative enhancement to the PyTorch torch.utils.data.DataLoader and torch.utils.data.Dataset/IterableDataset to make them scalable, performant dataloading solutions. We will be iterating on the enhancements under the torchdata repo.
Our first change begins with adding checkpointing to torch.utils.data.DataLoader, which can be found in
stateful_dataloader, a drop-in replacement for torch.utils.data.DataLoader, by defining
load_state_dict
and state_dict
methods that enable mid-epoch checkpointing, and an API for users to track custom
iteration progress, and other custom states from the dataloader workers such as token buffers and/or RNG states.
torchdata.stateful_dataloader.StatefulDataLoader
is a drop-in replacement for torch.utils.data.DataLoader which
provides state_dict and load_state_dict functionality. See
the Stateful DataLoader main page for more information and examples. Also check out the
examples
in this Colab notebook.
The following is the corresponding torchdata
versions and supported Python versions.
torch | torchdata | python |
---|---|---|
master / nightly | main / nightly | >=3.9 , <=3.12 (3.13 experimental) |
2.5.0 | 0.10.0 | >=3.9 , <=3.12 |
2.5.0 | 0.9.0 | >=3.9 , <=3.12 |
2.4.0 | 0.8.0 | >=3.8 , <=3.12 |
2.0.0 | 0.6.0 | >=3.8 , <=3.11 |
1.13.1 | 0.5.1 | >=3.7 , <=3.10 |
1.12.1 | 0.4.1 | >=3.7 , <=3.10 |
1.12.0 | 0.4.0 | >=3.7 , <=3.10 |
1.11.0 | 0.3.0 | >=3.7 , <=3.10 |
First, set up an environment. We will be installing a PyTorch binary as well as torchdata. If you're using conda, create a conda environment:
conda create --name torchdata
conda activate torchdata
If you wish to use venv
instead:
python -m venv torchdata-env
source torchdata-env/bin/activate
Install torchdata:
Using pip:
pip install torchdata
Using conda:
conda install -c pytorch torchdata
pip install .
In case building TorchData from source fails, install the nightly version of PyTorch following the linked guide on the contributing page.
The nightly version of TorchData is also provided and updated daily from main branch.
Using pip:
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
Using conda:
conda install torchdata -c pytorch-nightly
We welcome PRs! See the CONTRIBUTING file.
We'd love to hear from and work with early adopters to shape our designs. Please reach out by raising an issue if you're interested in using this tooling for your project.
TorchData is BSD licensed, as found in the LICENSE file.
FAQs
Composable data loading modules for PyTorch
We found that torchdata demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.