New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

torchsnapshot-nightly

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

torchsnapshot-nightly

A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

2024.7.26
PyPI

Maintainers: 1

TorchSnapshot (Beta Release)

A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

Install

Requires Python >= 3.8 and PyTorch >= 2.0.0

From pip:

# Stable
pip install torchsnapshot
# Or, using conda
conda install -c conda-forge torchsnapshot

# Nightly
pip install --pre torchsnapshot-nightly

From source:

git clone https://github.com/pytorch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install

Why TorchSnapshot

Performance

TorchSnapshot provides a fast checkpointing implementation employing various optimizations, including zero-copy serialization for most tensor types, overlapped device-to-host copy and storage I/O, parallelized storage I/O.
TorchSnapshot greatly speeds up checkpointing for DistributedDataParallel workloads by distributing the write load across all ranks (benchmark).
When host memory is abundant, TorchSnapshot allows training to resume before all storage I/O completes, reducing the time blocked by checkpoint saving.

Memory Usage

TorchSnapshot's memory usage adapts to the host's available resources, greatly reducing the chance of out-of-memory issues when saving and loading checkpoints.
TorchSnapshot supports efficient random access to individual objects within a snapshot, even when the snapshot is stored in a cloud object storage.

Usability

Simple APIs that are consistent between distributed and non-distributed workloads.
Out of the box integration with commonly used cloud object storage systems.
Automatic resharding (elasticity) on world size change for supported workloads (more details).

Security

Secure tensor serialization without pickle dependency [WIP].

Getting Started

from torchsnapshot import Snapshot

# Taking a snapshot
app_state = {"model": model, "optimizer": optimizer}
snapshot = Snapshot.take(path="/path/to/snapshot", app_state=app_state)

# Restoring from a snapshot
snapshot.restore(app_state=app_state)

See the documentation for more details.

License

torchsnapshot is BSD licensed, as found in the LICENSE file.

Keywords

FAQs

What is torchsnapshot-nightly?

Is torchsnapshot-nightly well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

torchsnapshot-nightly

TorchSnapshot (Beta Release)

Install

Why TorchSnapshot

Getting Started

License

Keywords

Related posts

Bybit Hack Puts Crypto Losses at $1.6B, Surpassing All of Last Year in Just Two Months

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain