Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Accelerator Module and Trainer based on Accelerate library for simple distributed train processes, inspired by PyTorch Lightning.
Module based on Accelerate 🤗 for distributed training accross multiple GPUs, with focus on readability and ease to customize experiments. We also integrate modified versions of DataCollators from Transformers library for huggingface standard tokenizers to integrate with different environments.
NOTE: Some features might not be tested and could cause problems. Feel free to open an issue or send a PR to fix any problem found.
AcceleratorModule will take care of the heavy lifting of distributed training on many GPUs. Accelerate is quite simple, and it has many adventages over PyTorch Lightning, mainly because it doesn't abstract the low level part of the training loop, so you can customize it however you want. The main idea of this little project is to have a standard way to make distributed training. This module let's you:
AcceleratorModule is available via pip:
pip install accmt
Import AcceleratorModule:
from accmt import AcceleratorModule
The AcceleratorModule class has 3 main methods:
The structure looks like this:
class ExampleModule(AcceleratorModule):
def __init__(self):
self.model = ...
def training_step(self, batch):
x, y = batch
# ...
return train_loss
def validation_step(self, batch):
x, y = batch
# ...
return val_loss
# if you want to calculate metrics on a test dataset, you can do the following:
def test_step(self, batch):
x, y = batch
# ...
predictions = ...
references = ...
return {
"accuracy": (predictions, references),
"any_other_metric": (predictions, references)
}
More information about module structure here.
To train this Module, you need a Trainer class:
from accmt import Trainer, HyperParameters
trainer = Trainer(
#hps_config="hps_config.yaml", # <--- can also be a YAML file.
hps_config=HyperParameters(epochs=2),
model_path="checkpoint_folder"
# ... other arguments
)
More information about trainer here.
This is a YAML file containing hyperparameters for your training. The structure looks like the following:
hps:
epochs: 40
batch_size: 35
optim:
type: AdamW
lr: 1e-3
weight_decay: 1e-3
scheduler:
type: OneCycleLR
max_lr: 1e-3
An optimizer (optim) is necessary, while a scheduler is optional (do not specify if you don't want to).
Available optimizer types are the following:
Optimizer | Source |
---|---|
Adam | PyTorch |
Adadelta | PyTorch |
Adagrad | PyTorch |
Adamax | PyTorch |
AdamW | PyTorch |
Adafactor | HuggingFace |
ASGD | PyTorch |
LBFGS | PyTorch |
NAdam | PyTorch |
RAdam | PyTorch |
RMSprop | PyTorch |
Rprop | PyTorch |
SGD | PyTorch |
SparseAdam | PyTorch |
Available schedulers types are the following:
Scheduler | Source |
---|---|
StepLR | PyTorch |
LinearLR | PyTorch |
ExponentialLR | PyTorch |
CosineAnnealingLR | PyTorch |
CyclicLR | PyTorch |
OneCycleLR | PyTorch |
CosineAnnealingWarmRestarts | PyTorch |
CosineWithWarmup | HuggingFace |
Constant | HuggingFace |
ConstantWithWarmup | HuggingFace |
CosineWithHardRestartsWithWarmup | HuggingFace |
InverseSQRT | HuggingFace |
LinearWithWarmup | HuggingFace |
PolynomialDecayWithWarmup | HuggingFace |
Finally, we can train our model by using the .fit() function, providing our AcceleratorModule and the train and validation datasets (from PyTorch):
trainer.fit(module, train_dataset, val_dataset)
More information about HPS config file here.
To run training, you can use accmt command-line utilities (which is a wrapper around Accelerate 🤗)
accmt launch train.py -N=8 --strat=deepspeed-2-bf16
This will run on 8 GPUs with DeepSpeed zero stage 2, with a mixed precision of bfloat16. If -N argument is not specified, accmt will launch N numbers of processes, where N will be equal to the number of GPUs detected in your system. Also, if --strat is not specified, default strategy will be DDP with no mixed precision.
You can use any Accelerate configuration that you want 🤗 (DDP, FSDP or DeepSpeed). For more strategies, check:
accmt strats # --ddp | --fsdp | --deepspeed <--- optional filters.
NOTE: You can also use accelerate command-line utilities instead.
More information about command-line utilities here.
Checkpointing is a default process in ACCMT, and it's customizable with some parameters in the Trainer constructor:
trainer = Trainer(
# ... Other parameters.
checkpoint_every="2ep", # Checkpoint every N epochs, in this case, every 2 epochs.
resume=True # Whether you want to resume from checkpoint (True), or start from scratch (False).
# if not specified (None), resuming will be done automatically.
)
Model saving is an integrated feature of ACCMT. You can enable it by specifying a directory where to save the model.
You can also save model in 3 different modes:
Or the following format:
And you can activate movel saving below or above a specific metric (e.g. if specified best_valid_loss, then model will be saved when validation loss is below or above the specified thresholds).
trainer = Trainer(
# ... Other parameters.
model_path="model", # Path where to save model.
model_saving="best_valid_loss", # Model saving mode.
model_saving_below=0.67 # Save model below this threshold (e.g. below 0.67 validation loss).
model_saving_above=0.01 # Completely optional.
)
When training big models, size in memory becomes a huge problem. One way to avoid that is to not always step the optimizer, instead accumulate gradients for a certain amount of steps. This is very easy to do, just configure the parameter grad_accumulation_steps for the amount of steps you want to accumulate gradients before stepping.
Logging training progress is set by default in ACCMT, as it is essential to track how good our experiments are, and determine if we're good to pause training.
There are only 2 paremeters to change for this (in the Trainer constructor):
You can implement your own collate function by overriding collate_fn from AcceleratorModule:
class ExampleModule(AcceleratorModule):
# Rest of the code...
def collate_fn(self, batch: list):
# Your collate function logic here.
return batch # Output taken in training and validation steps.
There is another and simplier way to add collators that I'm going to be building in the future, and that is using a specific DataCollator built into this library.
At the moment, there are 3 collators directly inspired on transformers library (with a little bit of modifications):
Example:
from accmt import Trainer, DataCollatorForSeq2Seq
tokenizer = ... # a tokenizer from 'transformers' library.
trainer = Trainer(
hps_config="hps_config.yaml",
model_path="dummy_model",
collate_fn=DataCollatorForSeq2Seq(tokenizer)
)
A Teacher-Student approach let's you mimic the behaviour of a bigger model (teacher) in a smaller model (student). This is a method for model distillation, useful to save computational resources and accelerate inference.
To load teacher and student models, we can do the following in the module constructor:
class TeacherStudentExampleModule(AcceleratorModule):
def __init__(self):
self.teacher = ... # teacher model
self.model = ... # student model
self.teacher.eval() # set teacher to evaluation mode
During training, the teacher model will only provide outputs, and will not have its parameters updated.
NOTE: In order to successfully load models into hardware, we must use self.teacher for teacher model, and self.model for student model.
If using KL Divergence approach for the loss function, our step method will look something like this:
import torch
import torch.nn.functional as F
# other imports...
# other logic for module...
def step(self, batch):
x = batch
with torch.no_grad(): # no gradients required for teacher model
teacher_logits = self.teacher(**x).logits
student_output = self.model(**x)
student_logits = student_output.logits
soft_prob = F.log_softmax(student_logits / self.T, dim=-1)
soft_targets = F.softmax(teacher_logits / self.T, dim=-1)
kd_loss = F.kl_div(soft_prob, soft_targets, reduction="batchmean") * (self.T**2)
loss = self.alpha * student_output.loss + (1. - self.alpha) * kd_loss
return loss
I will continue to update this repository to add more features overtime. If you want to contribute to this little project, feel free to make a PR 🤗.
FAQs
Accelerator Module and Trainer based on Accelerate library for simple distributed train processes, inspired by PyTorch Lightning.
We found that accmt demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.