
Security News
NIST Under Federal Audit for NVD Processing Backlog and Delays
As vulnerability data bottlenecks grow, the federal government is formally investigating NIST’s handling of the National Vulnerability Database.
A package for applying differential privacy to model training using gradient shuffling and membership inference attack detection.
ForgetNet introduces a novel privacy-preserving technique for deep learning: Differentially Private Block-wise Gradient Shuffle (DP-BloGS). 🔒🔀
pip install forgetnet
from forgetnet import BloGSSFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load your model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Initialize the DP-BloGS trainer
trainer = BloGSSFTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
dataset_text_field="text",
target_epsilon=1.0,
delta=1e-5,
clip_value=1.0
)
# Train your model with privacy guarantees
trainer.train()
Here's how to use the BloGS Privacy Engine with a ResNet model for MNIST classification:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.models import resnet18
from torch.utils.data import DataLoader
from forgetnet import BloGSPrivacyEngine
# Modify ResNet18 for MNIST (1 channel input instead of 3)
def mnist_resnet18():
model = resnet18()
model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
model.fc = nn.Linear(model.fc.in_features, 10) # 10 classes for MNIST
return model
# Hyperparameters
batch_size = 64
learning_rate = 0.01
epochs = 10
target_epsilon = 1.0
delta = 1e-5
clip_value = 1.0
# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load MNIST dataset
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
# Calculate total_iterations based on dataset size
total_iterations = (len(train_dataset) // batch_size) * epochs
# Initialize model
model = mnist_resnet18().to(device)
# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
# Wrap the optimizer with the PrivacyEngine
privacy_engine = BloGSPrivacyEngine(
optimizer=optimizer,
model=model,
target_epsilon=target_epsilon,
delta=delta,
clip_value=clip_value,
steps=total_iterations,
batch_size=batch_size
)
# Training loop
model.train()
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
privacy_engine.zero_grad()
output = model(data)
loss = nn.functional.cross_entropy(output, target)
loss.backward()
epsilon_spent, delta = privacy_engine.step()
if batch_idx % 100 == 0:
print(f'Epoch {epoch}, Batch {batch_idx}, Loss: {loss.item():.4f}, Epsilon: {epsilon_spent:.4f}')
# Get total privacy spent after training
total_epsilon_spent = privacy_engine.get_privacy_spent()
print(f"Total privacy spent: ε = {total_epsilon_spent:.4f}")
This example demonstrates:
Adjust hyperparameters as needed for your specific use case.
DP-BloGS introduces a probabilistic approach to gradient noise through block-wise shuffling:
This combination allows for fast training while maintaining strong privacy guarantees!
DP-BloGS has been tested on various model architectures, including:
Results show competitive or better performance compared to DP-SGD in terms of:
ForgetNet now includes a powerful Membership Inference Attack tool to assess the privacy risks of your language models:
from forgetnet import LanguageMIA
# Initialize the MIA tool
mia = LanguageMIA()
# Perform the attack
results = mia.attack(train_dataset, test_dataset, model, tokenizer)
# Print the results
print(f"ROC AUC: {results['roc_auc']:.4f}")
print(f"Precision-Recall AUC: {results['precision_recall_auc']:.4f}")
print(f"Best model: {results['best_model']}")
print(f"Optimal threshold: {results['optimal_threshold']:.4f}")
The LanguageMIA
class provides a detailed analysis of your model's vulnerability to membership inference attacks:
Easily incorporate MIA into your model evaluation pipeline:
def evaluate_model(model, train_dataset, test_dataset, tokenizer):
# Perform membership inference attack
mia = LanguageMIA()
mia_results = mia.attack(train_dataset, test_dataset, model, tokenizer)
# Evaluate perplexity (assuming a function evaluate_perplexity exists)
perplexity = evaluate_perplexity(model, test_dataset, tokenizer)
# Combine results
results = {
'perplexity': perplexity,
'mia_results': mia_results,
}
return results
# Usage
evaluation_results = evaluate_model(model, train_dataset, test_dataset, tokenizer)
print(f"Model Perplexity: {evaluation_results['perplexity']:.2f}")
print(f"MIA ROC-AUC: {evaluation_results['mia_results']['roc_auc']:.4f}")
Use the LanguageMIA
tool to ensure your language models are both powerful and privacy-preserving!
If you use ForgetNet in your research, please cite my paper:
@article{zagardo2024dpblogs,
title={Differentially Private Block-wise Gradient Shuffle for Deep Learning},
author={Zagardo, David},
journal={arXiv preprint arXiv:2407.21347},
year={2024},
note={arXiv:2407.21347 [cs.LG]}
}
We welcome contributions! Please see my CONTRIBUTING.md for details on how to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
We thank the open-source community and the authors of the papers cited in our work for their valuable contributions to the field of privacy-preserving machine learning.
Built with 🧠 by David Zagardo
FAQs
A package for applying differential privacy to model training using gradient shuffling and membership inference attack detection.
We found that forgetnet demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
As vulnerability data bottlenecks grow, the federal government is formally investigating NIST’s handling of the National Vulnerability Database.
Research
Security News
Socket’s Threat Research Team has uncovered 60 npm packages using post-install scripts to silently exfiltrate hostnames, IP addresses, DNS servers, and user directories to a Discord-controlled endpoint.
Security News
TypeScript Native Previews offers a 10x faster Go-based compiler, now available on npm for public testing with early editor and language support.