New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

alma-torch

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

alma-torch

A package for benchmarking the speed of different PyTorch conversion options

  • 0.3.7
  • PyPI
  • Socket score

Maintainers
1

alma

Alma Logo

A Python library for benchmarking PyTorch model speed for different conversion options 🚀

license discord Downloads monthly downloads

With just one function call, you can get a full report on how fast your PyTorch model runs for inference across over 40 conversion options, such as JIT tracing, torch.compile, torch.export, torchao, ONNX, OpenVINO, TensorRT, and many more!

This allows one to find the best option for one's model, data, and hardware. See here for all supported options.

Table of Contents

Getting Started

Installation

alma is available as a Python package.

One can install the package from python package index by running

pip install alma-torch

Alternatively, it can be installed from the root of this repository by running:

pip install -e .

Docker

We recommend that you build the provided Dockerfile to ensure an easy installation of all of the system dependencies and the alma pip packages.

Working with the docker image
  1. Build the Docker Image

    bash scripts/build_docker.sh
    
  2. Run the Docker Container
    Create and start a container named alma:

    bash scripts/run_docker.sh
    
  3. Access the Running Container
    Enter the container's shell:

    docker exec -it alma bash
    
  4. Mount Your Repository
    By default, the run_docker.sh script mounts your /home directory to /home inside the container.
    If your alma repository is in a different location, update the bind mount, for example:

    -v /Users/myuser/alma:/home/alma
    

Basic usage

The core API is benchmark_model, which is used to benchmark the speed of a model for different conversion options. The usage is as follows:

from alma import benchmark_model
from alma.benchmark import BenchmarkConfig
from alma.benchmark.log import display_all_results

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# Load the model
model = ...

# Load the dataloader used in benchmarking
data_loader = ...

# Set the configuration (this can also be passed in as a dict)
config = BenchmarkConfig(
    n_samples=2048,
    batch_size=64,
    device=device,  # The device to run the model on
)

# Choose with conversions to benchmark
conversions = ["EAGER", "TORCH_SCRIPT", "COMPILE_INDUCTOR_MAX_AUTOTUNE", "COMPILE_OPENXLA"]

# Benchmark the model
results = benchmark_model(model, config, conversions, data_loader=data_loader)

# Print all results
display_all_results(results)

The results will look like this, depending on one's model, dataloader, and hardware.

EAGER results:
Device: cuda
Total elapsed time: 0.0206 seconds
Total inference time (model only): 0.0074 seconds
Total samples: 2048 - Batch size: 64
Throughput: 275643.45 samples/second

TORCH_SCRIPT results:
Device: cuda
Total elapsed time: 0.0203 seconds
Total inference time (model only): 0.0043 seconds
Total samples: 2048 - Batch size: 64
Throughput: 477575.34 samples/second

COMPILE_INDUCTOR_MAX_AUTOTUNE results:
Device: cuda
Total elapsed time: 0.0159 seconds
Total inference time (model only): 0.0035 seconds
Total samples: 2048 - Batch size: 64
Throughput: 592801.70 samples/second

COMPILE_OPENXLA results:
Device: xla:0
Total elapsed time: 0.0146 seconds
Total inference time (model only): 0.0033 seconds
Total samples: 2048 - Batch size: 64
Throughput: 611865.07 samples/second

See the examples for discussion of design choices and for examples of more advanced usage, e.g. controlling the multiprocessing setup, controlling graceful failures, setting default device fallbacks if a conversion option is incompatible with your specified device, memory efficient usage of alma, etc.

Conversion Options

Naming conventions

The naming convention for conversion options is as follows:

  • Short but descriptive names for each technique, e.g. EAGER, EXPORT, etc.
  • Underscores _ are used within each technique name to seperate the words for readability, e.g. AOT_INDUCTOR, COMPILE_CUDAGRAPHS, etc.
  • If multiple "techniques" are used in a conversion option, then the names are separated by a + sign in chronological order of operation. For example, EXPORT+EAGER, EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNE. In both cases, EXPORT is the first operation, followed by EAGER or COMPILE_INDUCTOR_MAX_AUTOTUNE.

Conversion Options Summary

Below is a table summarizing the currently supported conversion options and their identifiers:

IDConversion OptionDevice SupportProject
0EAGERCPU, MPS, GPUPyTorch
1EXPORT+EAGERCPU, MPS, GPUtorch.export
2ONNX_CPUCPUONNXRT
3ONNX_GPUGPUONNXRT
4ONNX+DYNAMO_EXPORTCPUONNXRT
5COMPILE_CUDAGRAPHSGPU (CUDA)torch.compile
6COMPILE_INDUCTOR_DEFAULTCPU, MPS, GPUtorch.compile
7COMPILE_INDUCTOR_REDUCE_OVERHEADCPU, MPS, GPUtorch.compile
8COMPILE_INDUCTOR_MAX_AUTOTUNECPU, MPS, GPUtorch.compile
9COMPILE_INDUCTOR_EAGER_FALLBACKCPU, MPS, GPUtorch.compile
10COMPILE_ONNXRTCPU, MPS, GPUtorch.compile + ONNXRT
11COMPILE_OPENXLAXLA_GPUtorch.compile + OpenXLA
12COMPILE_TVMCPU, MPS, GPUtorch.compile + Apache TVM
13EXPORT+AI8WI8_FLOAT_QUANTIZEDCPU, MPS, GPUtorch.export
14EXPORT+AI8WI8_FLOAT_QUANTIZED+RUN_DECOMPOSITIONCPU, MPS, GPUtorch.export
15EXPORT+AI8WI8_STATIC_QUANTIZEDCPU, MPS, GPUtorch.export
16EXPORT+AI8WI8_STATIC_QUANTIZED+RUN_DECOMPOSITIONCPU, MPS, GPUtorch.export
17EXPORT+AOT_INDUCTORCPU, MPS, GPUtorch.export + aot_inductor
18EXPORT+COMPILE_CUDAGRAPHSGPU (CUDA)torch.export + torch.compile
19EXPORT+COMPILE_INDUCTOR_DEFAULTCPU, MPS, GPUtorch.export + torch.compile
20EXPORT+COMPILE_INDUCTOR_REDUCE_OVERHEADCPU, MPS, GPUtorch.export + torch.compile
21EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNECPU, MPS, GPUtorch.export + torch.compile
22EXPORT+COMPILE_INDUCTOR_DEFAULT_EAGER_FALLBACKCPU, MPS, GPUtorch.export + torch.compile
23EXPORT+COMPILE_ONNXRTCPU, MPS, GPUtorch.export + torch.compile + ONNXRT
24EXPORT+COMPILE_OPENXLAXLA_GPUtorch.export + torch.compile + OpenXLA
25EXPORT+COMPILE_TVMCPU, MPS, GPUtorch.export + torch.compile + Apache TVM
26NATIVE_CONVERT_AI8WI8_STATIC_QUANTIZEDCPUCPU (PyTorch)
27NATIVE_FAKE_QUANTIZED_AI8WI8_STATICCPU, GPUCPU (PyTorch)
28COMPILE_TENSORRTGPU (CUDA)torch.compile + NVIDIA TensorRT
29EXPORT+COMPILE_TENSORRTGPU (CUDA)torch.export + torch.compile + NVIDIA TensorRT
30COMPILE_OPENVINOCPU (Intel)torch.compile + OpenVINO
31JIT_TRACECPU, MPS, GPUPyTorch
32TORCH_SCRIPTCPU, MPS, GPUPyTorch
33OPTIMUM_QUANTO_AI8WI8CPU, MPS, GPUoptimum quanto
34OPTIMUM_QUANTO_AI8WI4CPU, MPS, GPU (not all GPUs supported)optimum quanto
35OPTIMUM_QUANTO_AI8WI2CPU, MPS, GPU (not all GPUs supported)optimum quanto
36OPTIMUM_QUANTO_WI8CPU, MPS, GPUoptimum quanto
37OPTIMUM_QUANTO_WI4CPU, MPS, GPU (not all GPUs supported)optimum quanto
38OPTIMUM_QUANTO_WI2CPU, MPS, GPU (not all GPUs supported)optimum quanto
39OPTIMUM_QUANTO_Wf8E4M3NCPU, MPS, GPUoptimum quanto
40OPTIMUM_QUANTO_Wf8E4M3NUZCPU, MPS, GPUoptimum quanto
41OPTIMUM_QUANTO_Wf8E5M2CPU, MPS, GPUoptimum quanto
42OPTIMUM_QUANTO_Wf8E5M2+COMPILE_CUDAGRAPHSGPU (CUDA)optimum quanto + torch.compile
43FP16+EAGERCPU, MPS, GPUPyTorch
44BF16+EAGERCPU, MPS, GPU (not all GPUs natively supported)PyTorch
45COMPILE_INDUCTOR_MAX_AUTOTUNE+
TORCHAO_AUTOQUANT_DEFAULT
GPUtorch.compile + torchao
46COMPILE_INDUCTOR_MAX_AUTOTUNE+
TORCHAO_AUTOQUANT_NONDEFAULT
GPUtorch.compile + torchao
47COMPILE_CUDAGRAPHS+
TORCHAO_AUTOQUANT_DEFAULT
GPU (CUDA)torch.compile + torchao
48COMPILE_INDUCTOR_MAX_AUTOTUNE+
TORCHAO_QUANT_I4_WEIGHT_ONLY
GPU (requires bf16 support)torch.compile + torchao
49TORCHAO_QUANT_I4_WEIGHT_ONLYGPU (requires bf16 support)torchao
50FP16+COMPILE_CUDAGRAPHSGPU (CUDA)PyTorch + torch.compile
51FP16+COMPILE_INDUCTOR_DEFAULTCPU, MPS, GPUPyTorch + torch.compile
52FP16+COMPILE_INDUCTOR_REDUCE_OVERHEADCPU, MPS, GPUPyTorch + torch.compile
53FP16+COMPILE_INDUCTOR_MAX_AUTOTUNECPU, MPS, GPUPyTorch + torch.compile
54FP16+COMPILE_INDUCTOR_EAGER_FALLBACKCPU, MPS, GPUPyTorch + torch.compile
55FP16+COMPILE_ONNXRTCPU, MPS, GPUPyTorch + torch.compile + ONNXRT
56FP16+COMPILE_OPENXLAXLA_GPUPyTorch + torch.compile + OpenXLA
57FP16+COMPILE_TVMCPU, MPS, GPUPyTorch + torch.compile + Apache TVM
58FP16+COMPILE_TENSORRTGPU (CUDA)PyTorch + torch.compile + NVIDIA TensorRT
59FP16+COMPILE_OPENVINOCPU (Intel)PyTorch + torch.compile + OpenVINO
60FP16+EXPORT+COMPILE_CUDAGRAPHSGPU (CUDA)torch.export + torch.compile
61FP16+EXPORT+COMPILE_INDUCTOR_DEFAULTCPU, MPS, GPUtorch.export + torch.compile
62FP16+EXPORT+COMPILE_INDUCTOR_REDUCE_OVERHEADCPU, MPS, GPUtorch.export + torch.compile
63FP16+EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNECPU, MPS, GPUtorch.export + torch.compile
64FP16+EXPORT+COMPILE_INDUCTOR_DEFAULT_EAGER_FALLBACKCPU, MPS, GPUtorch.export + torch.compile
65FP16+EXPORT+COMPILE_ONNXRTCPU, MPS, GPUtorch.export + torch.compile + ONNXRT
66FP16+EXPORT+COMPILE_OPENXLAXLA_GPUtorch.export + torch.compile + OpenXLA
67FP16+EXPORT+COMPILE_TVMCPU, MPS, GPUtorch.export + torch.compile + Apache TVM
68FP16+EXPORT+COMPILE_TENSORRTGPU (CUDA)torch.export + torch.compile + NVIDIA TensorRT
69FP16+EXPORT+COMPILE_OPENVINOCPU (Intel)torch.export + torch.compile + OpenVINO
70FP16+JIT_TRACECPU, MPS, GPUPyTorch
71FP16+TORCH_SCRIPTCPU, MPS, GPUPyTorch

These conversion options are also all hard-coded in the conversion options file, which is the source of truth.

Testing:

We use pytest for testing. Simply run:

pytest

We currently don't have comprehensive tests, but we are working on adding more tests to ensure that the conversion options are working as expected in known environments (e.g. the Docker container).

Future work:

  • Add more conversion options. This is a work in progress, and we are always looking for more conversion options.
  • Multi-device benchmarking. Currently alma only supports single-device benchmarking, but ideally a model could be split across multiple devices.
  • Integrating conversion options beyond PyTorch, e.g. HuggingFace, JAX, llama.cpp, etc.

How to contribute:

Contributions are welcome! If you have a new conversion option, feature, or other you would like to add, so that the whole community can benefit, please open a pull request! We are always looking for new conversion options, and we are happy to help you get started with adding a new conversion option/feature!

See the CONTRIBUTING.md file for more detailed information on how to contribute.

Citation

@Misc{alma,
  title =        {Alma: PyTorch model speed benchmarking across all conversion types},
  author =       {Oscar Savolainen and Saif Haq},
  howpublished = {\url{https://github.com/saifhaq/alma}},
  year =         {2024}
}

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc