TorchEval
This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible. If you have suggestions for improvements, please open a GitHub issue. We'd love to hear your feedback.
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.
Installing TorchEval
Requires Python >= 3.8 and PyTorch >= 1.11
From pip:
pip install torcheval
For nighly build version
pip install --pre torcheval-nightly
From source:
git clone https://github.com/pytorch/torcheval
cd torcheval
pip install -r requirements.txt
python setup.py install
Quick Start
Take a look at the quickstart notebook, or fork it on Colab.
There are more examples in the examples directory:
cd torcheval
python examples/simple_example.py
Documentation
Documentation can be found at at pytorch.org/torcheval
Using TorchEval
TorchEval can be run on CPU, GPU, and in a multi-process or multi-GPU setting. Metrics are provided in two interfaces, functional and class based. The functional interfaces can be found in torcheval.metrics.functional
and are useful when your program runs in a single process setting. To use multi-process or multi-gpu configurations, the class-based interfaces, found in torcheval.metrics
provide a much simpler experience. The class based interfaces also allow you to defer some of the computation of the metric by calling update()
multiple times before compute()
. This can be advantageous even in a single process setting due to saved computation overhead.
Single Process
For use in a single process program, the simplest use case utilizes a functional metric. We simply import the metric function and feed in our outputs and targets. The example below shows a minimal PyTorch training loop that evaluates the multiclass accuracy of every fourth batch of data.
Functional Version (immediate computation of metric)
import torch
from torcheval.metrics.functional import multiclass_accuracy
NUM_BATCHES = 16
BATCH_SIZE = 8
INPUT_SIZE = 10
NUM_CLASSES = 6
eval_frequency = 4
model = torch.nn.Sequential(torch.nn.Linear(INPUT_SIZE, NUM_CLASSES), torch.nn.ReLU())
optim = torch.optim.Adagrad(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()
metric_history = []
for batch in range(NUM_BATCHES):
input = torch.rand(size=(BATCH_SIZE, INPUT_SIZE))
target = torch.randint(size=(BATCH_SIZE,), high=NUM_CLASSES)
outputs = model(input)
loss = loss_fn(outputs, target)
optim.zero_grad()
loss.backward()
optim.step()
if (batch + 1) % eval_frequency == 0:
metric_history.append(multiclass_accuracy(outputs, target))
Single Process with Deferred Computation
Class Version (enables deferred computation of metric)
import torch
from torcheval.metrics import MulticlassAccuracy
NUM_BATCHES = 16
BATCH_SIZE = 8
INPUT_SIZE = 10
NUM_CLASSES = 6
eval_frequency = 4
model = torch.nn.Sequential(torch.nn.Linear(INPUT_SIZE, NUM_CLASSES), torch.nn.ReLU())
optim = torch.optim.Adagrad(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()
metric = MulticlassAccuracy()
metric_history = []
for batch in range(NUM_BATCHES):
input = torch.rand(size=(BATCH_SIZE, INPUT_SIZE))
target = torch.randint(size=(BATCH_SIZE,), high=NUM_CLASSES)
outputs = model(input)
loss = loss_fn(outputs, target)
optim.zero_grad()
loss.backward()
optim.step()
metric.update(input, target)
if (batch + 1) % eval_frequency == 0:
metric_history.append(metric.compute())
metric.reset()
Multi-Process or Multi-GPU
For usage on multiple devices a minimal example is given below. In the normal torch.distributed
paradigm, each device is allocated its own process gets a unique numerical ID called a "global rank", counting up from 0.
Class Version (enables deferred computation and multi-processing)
import torch
from torcheval.metrics.toolkit import sync_and_compute
from torcheval.metrics import MulticlassAccuracy
local_rank = int(os.environ["LOCAL_RANK"])
global_rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])
device = torch.device(
f"cuda:{local_rank}"
if torch.cuda.is_available() and torch.cuda.device_count() >= world_size
else "cpu"
)
metric = MulticlassAccuracy(device=device)
num_epochs, num_batches = 4, 8
for epoch in range(num_epochs):
for i in range(num_batches):
input = torch.randint(high=5, size=(10,), device=device)
target = torch.randint(high=5, size=(10,), device=device)
metric.update(input, target)
local_compute_result = metric.compute()
global_compute_result = sync_and_compute(metric)
if global_rank == 0:
print(global_compute_result)
metric.reset()
See the example directory for more examples.
Contributing
We welcome PRs! See the CONTRIBUTING file.
License
TorchEval is BSD licensed, as found in the LICENSE file.