Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
A library for dynamic gradient homogenization for multitask learning in Pytorch
Installing this library is as simple as running in your terminal
pip install rotograd
The code has been tested in Pytorch 1.7.0, yet it should work on most versions. Feel free to open an issue if that were not the case.
This is the official Pytorch implementation of RotoGrad, an algorithm to reduce the negative transfer due to gradient conflict with respect to the shared parameters when different tasks of a multitask learning system fight for the shared resources.
Let's say you have a hard-parameter sharing architecture with a backbone
model shared across tasks, and
two different tasks you want to solve. These tasks take the output of the backbone z = backbone(x)
and fed
it to a task-specific model (head1
and head2
) to obtain the predictions of their tasks, that is,
y1 = head1(z)
and y2 = head2(z)
.
Then you can simply use RotateOnly, RotoGrad. or RotoGradNorm (RotateOnly + GradNorm) by putting all parts together in a single model.
from rotograd import RotoGrad
model = RotoGrad(backbone, [head1, head2], size_z, normalize_losses=True)
where you can recover the backbone and i-th head simply calling model.backbone
and model.heads[i]
. Even
more, you can obtain the end-to-end model for a single task (that is, backbone + head), by typing model[i]
.
As discussed in the paper, it is advisable to have a smaller learning rate for the parameters of RotoGrad and GradNorm. This is as simple as doing:
optimizer = nn.Adam(
[{'params': m.parameters()} for m in [backbone, head1, head2]] +
[{'params': model.parameters(), 'lr': learning_rate_rotograd}],
lr=learning_rate_model)
Finally, we can train the model on all tasks using a simple step function:
import rotograd
def step(x, y1, y2):
model.train()
optimizer.zero_grad()
with rotograd.cached(): # Speeds-up computations by caching Rotograd's parameters
pred1, pred2 = model(x)
loss1, loss2 = loss_task1(pred1, y1), loss_task2(pred2, y2)
model.backward([loss1, loss2])
optimizer.step()
return loss1, loss2
You can find a working example in the folder example
. However, it requires some other dependencies to run (e.g.,
ignite and seaborn). The example shows how to use RotoGrad on one of the regression problems from the manuscript.
Consider citing the following paper if you use RotoGrad:
@inproceedings{javaloy2022rotograd,
title={RotoGrad: Gradient Homogenization in Multitask Learning},
author={Adri{\'a}n Javaloy and Isabel Valera},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=T8wHz4rnuGL}
}
FAQs
RotoGrad: Gradient Homogenization in Multitask Learning in Pytorch
We found that rotograd demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.