Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
General purpose model trainer for PyTorch that is more flexible than it should be, by 🐸Coqui.
An opinionated general purpose model trainer on PyTorch with a simple code base.
From Github:
git clone https://github.com/coqui-ai/Trainer
cd Trainer
make install
From PyPI:
pip install trainer
Prefer installing from Github as it is more stable.
Subclass and overload the functions in the TrainerModel()
See the MNIST example.
With 👟 you can define the whole optimization cycle as you want as the in GAN example below. It enables more under-the-hood control and flexibility for more advanced training loops.
You just have to use the scaled_backward()
function to handle mixed precision training.
...
def optimize(self, batch, trainer):
imgs, _ = batch
# sample noise
z = torch.randn(imgs.shape[0], 100)
z = z.type_as(imgs)
# train discriminator
imgs_gen = self.generator(z)
logits = self.discriminator(imgs_gen.detach())
fake = torch.zeros(imgs.size(0), 1)
fake = fake.type_as(imgs)
loss_fake = trainer.criterion(logits, fake)
valid = torch.ones(imgs.size(0), 1)
valid = valid.type_as(imgs)
logits = self.discriminator(imgs)
loss_real = trainer.criterion(logits, valid)
loss_disc = (loss_real + loss_fake) / 2
# step dicriminator
_, _ = self.scaled_backward(loss_disc, None, trainer, trainer.optimizer[0])
if trainer.total_steps_done % trainer.grad_accum_steps == 0:
trainer.optimizer[0].step()
trainer.optimizer[0].zero_grad()
# train generator
imgs_gen = self.generator(z)
valid = torch.ones(imgs.size(0), 1)
valid = valid.type_as(imgs)
logits = self.discriminator(imgs_gen)
loss_gen = trainer.criterion(logits, valid)
# step generator
_, _ = self.scaled_backward(loss_gen, None, trainer, trainer.optimizer[1])
if trainer.total_steps_done % trainer.grad_accum_steps == 0:
trainer.optimizer[1].step()
trainer.optimizer[1].zero_grad()
return {"model_outputs": logits}, {"loss_gen": loss_gen, "loss_disc": loss_disc}
...
See the GAN training example with Gradient Accumulation
see the test script here for training with batch size finder.
The batch size finder starts at a default BS(defaults to 2048 but can also be user defined) and searches for the largest batch size that can fit on your hardware. you should expect for it to run multiple trainings until it finds it. to use it instead of calling trainer.fit()
youll call trainer.fit_with_largest_batch_size(starting_batch_size=2048)
with starting_batch_size
being the batch the size you want to start the search with. very useful if you are wanting to use as much gpu mem as possible.
$ python -m trainer.distribute --script path/to/your/train.py --gpus "0,1"
We don't use .spawn()
to initiate multi-gpu training since it causes certain limitations.
.spawn()
trains the model in subprocesses and the model in the main process is not updated.Setting use_accelerate
in TrainingArgs
to True
will enable training with Accelerate.
You can also use it for multi-gpu or distributed training.
CUDA_VISIBLE_DEVICES="0,1,2" accelerate launch --multi_gpu --num_processes 3 train_recipe_autoregressive_prompt.py
See the Accelerate docs.
👟 Supports callbacks to customize your runs. You can either set callbacks in your model implementations or give them explicitly to the Trainer.
Please check trainer.utils.callbacks
to see available callbacks.
Here is how you provide an explicit call back to a 👟Trainer object for weight reinitialization.
def my_callback(trainer):
print(" > My callback was called.")
trainer = Trainer(..., callbacks={"on_init_end": my_callback})
trainer.fit()
import torch
profiler = torch.profiler.profile(
activities=[
torch.profiler.ProfilerActivity.CPU,
torch.profiler.ProfilerActivity.CUDA,
],
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
on_trace_ready=torch.profiler.tensorboard_trace_handler("./profiler/"),
record_shapes=True,
profile_memory=True,
with_stack=True,
)
prof = trainer.profile_fit(profiler, epochs=1, small_run=64)
then run Tensorboard
tensorboard --logdir="./profiler/"
To add a new logger, you must subclass BaseDashboardLogger and overload its functions.
We constantly seek to improve 🐸 for the community. To understand the community's needs better and address them accordingly, we collect stripped-down anonymized usage stats when you run the trainer.
Of course, if you don't want, you can opt out by setting the environment variable TRAINER_TELEMETRY=0
.
FAQs
General purpose model trainer for PyTorch that is more flexible than it should be, by 🐸Coqui.
We found that trainer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.