Socket
Book a DemoInstallSign in
Socket

gradient-cache

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

gradient-cache

GPU memory-efficient training with gradient compression for PyTorch

pipPyPI
Version
1.0.0
Maintainers
1

Gradient Cache - GPU Memory-Efficient Training

Version License

Gradient Cache is a production-ready PyTorch extension that reduces GPU memory usage by 90%+ during neural network training through intelligent gradient compression and CPU offloading.

๐Ÿš€ Key Features

  • 90%+ Memory Savings: Compress gradients by 100x with minimal accuracy impact
  • Larger Batch Sizes: Train with 2-3x larger batches on the same hardware
  • Simple Integration: Just 3 lines of code to add to any training loop
  • Universal Compatibility: Works with any PyTorch model and optimizer
  • Production Ready: Tested on A100 and T4 GPUs with real models

๐Ÿ“Š Proven Results

ModelParametersMemory SavedCompression
GPT-2 Small124M479 MB/step100x
GPT-2 Medium350M~1.3 GB/step100x
Custom NN50M144 MB/step100x

๐Ÿ”ง Installation

pip install gradient-cache

Or install from source:

git clone https://github.com/your-username/gradient-cache
cd gradient-cache
pip install -e .

๐Ÿ’ก Quick Start

Add gradient cache to any PyTorch training loop with just 3 lines:

import gradient_cache

# Create your model
model = create_your_model().cuda()

# Add gradient cache (1 line)
hook_manager = gradient_cache.create_gradient_cache(model, compression_ratio=100)

# Normal training loop
optimizer = torch.optim.Adam(model.parameters())

for batch in dataloader:
    loss = model(batch).mean()
    loss.backward()
    
    # Compress gradients (1 line)
    hook_manager.compress_and_free_gradients()
    
    # Restore gradients and update (1 line)
    hook_manager.apply_gradients()
    optimizer.step()
    optimizer.zero_grad()

๐ŸŽฏ Integration with Training Frameworks

Metaflow Integration

Use the decorator for automatic integration:

from metaflow import FlowSpec, step
import gradient_cache

class MyTrainingFlow(FlowSpec):
    @step
    @gradient_cache.optimize(compression_ratio=100)
    def train(self):
        # Your training code - no changes needed!
        model = create_model()
        optimizer = torch.optim.Adam(model.parameters())
        # ... rest of training

PyTorch Lightning

import pytorch_lightning as pl
import gradient_cache

class MyModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = create_model()
        self.hook_manager = gradient_cache.create_gradient_cache(self.model)
        
    def training_step(self, batch, batch_idx):
        loss = self.model(batch).mean()
        return loss
    
    def on_after_backward(self):
        self.hook_manager.compress_and_free_gradients()
        
    def optimizer_step(self, *args, **kwargs):
        self.hook_manager.apply_gradients()
        super().optimizer_step(*args, **kwargs)

๐Ÿ› ๏ธ Advanced Usage

Custom Compression Ratios

# Conservative - 10x compression (keep 10%)
hook_manager = gradient_cache.create_gradient_cache(model, compression_ratio=10)

# Aggressive - 1000x compression (keep 0.1%) 
hook_manager = gradient_cache.create_gradient_cache(model, compression_ratio=1000)

Exclude Critical Layers

# Don't compress embeddings or output layers
hook_manager = gradient_cache.GradientCacheHookManager(
    model,
    compression_ratio=100,
    exclude_layers=['embedding', 'lm_head']
)

Monitor Compression

# Enable verbose mode
hook_manager = gradient_cache.create_gradient_cache(model, verbose=True)

# Get compression statistics
stats = hook_manager.get_compression_summary()
print(f"Compression ratio: {stats['overall_compression_ratio']:.1f}x")
print(f"Memory saved: {stats['memory_saved_mb']:.1f} MB")

๐Ÿ“ˆ How It Works

  • Gradient Computation: Normal backward pass computes gradients
  • Compression: Keep only top 1% of gradient values by magnitude
  • CPU Offload: Move compressed gradients to system RAM
  • GPU Memory Release: Free GPU memory for next batch
  • Gradient Restoration: Restore gradients for optimizer step

๐Ÿ† Benefits

  • Cost Savings: Use smaller, cheaper GPU instances
  • Larger Models: Train models that don't fit in GPU memory
  • Faster Research: Iterate quickly with larger batch sizes
  • Easy Integration: No model architecture changes needed

๐Ÿงช Testing

Run the test suite:

python tests/test_gradient_cache.py

๐Ÿ“ Citation

If you use Gradient Cache in your research, please cite:

@software{gradient_cache,
  title = {Gradient Cache: GPU Memory-Efficient Training},
  author = {Gradient Cache Contributors},
  year = {2024},
  url = {https://github.com/gradient-cache/gradient-cache}
}

๐Ÿ“„ License

Apache License 2.0 - see LICENSE for details.

๐Ÿค Contributing

We welcome contributions! Please submit issues and pull requests on GitHub.

๐Ÿ“ง Support

Built with โค๏ธ for the ML community

Keywords

deep learning

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with โšก๏ธ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.