Introducing Socket Firewall: Free, Proactive Protection for Your Software Supply Chain.Learn More →

Book a Demo Install Sign in

onnx-tool

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

onnx-tool

A tool for ONNX model: A parser, editor and profiler tool for ONNX models.

PyPI

Version: 0.9.0

Maintainers: 1

简体中文

onnx-tool

A tool for ONNX model:

Parse and edit: Constant folding; OPs fusion.
Model profiling: Rapid shape inference; MACs statistics
Compute Graph and Shape Engine.
Model memory compression: activation compression and weight compression.
Quantized models and sparse models are supported.

Supported Models:

NLP: BERT, T5, GPT, LLaMa, MPT(TransformerModel)
Diffusion: Stable Diffusion(TextEncoder, VAE, UNET)
CV: BEVFormer, MobileNet, YOLO, ...
Audio: sovits, LPCNet

Basic Parse and Edit

You can load any onnx file by onnx_tool.Model:
Change graph structure with onnx_tool.Graph;
Change op attributes and IO tensors with onnx_tool.Node;
Change tensor data or type with onnx_tool.Tensor.
To apply your changes, just call save_model method of onnx_tool.Model or onnx_tool.Graph.

Please refer benchmark/examples.py.

Shape Inference & Profile Model

All profiling data must be built on shape inference result.
ONNX graph with tensor shapes:

Regular model profiling table:

Sparse profiling table:

Introduction: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow usage: data/TensorflowUsage.md.
examples: benchmark/examples.py.

Compute Graph with Shape Engine

From a raw graph to a compute graph:

Remove shape calculation layers(created by ONNX export) to get a Compute Graph. Use Shape Engine to update tensor shapes at runtime.
Examples: benchmark/shape_regress.py. benchmark/examples.py.
Integrate Compute Graph and Shape Engine into a cpp inference engine: data/inference_engine.md

Memory Compression

Activation Compression

Activation memory also called temporary memory is created by each OP's output. Only the last activation marked as the model's output will be kept. So you don't have to prepare memory space for each activation tensor. They better reuse an optimized memory size.

For large language models and high-resolution CV models, the activation memory compression is a key to save memory.
The compression method achieves 5% memory compression on most models.
For example:

model	Native Memory Size(MB)	Compressed Memory Size(MB)	Compression Ratio(%)
StableDiffusion(VAE_encoder)	14,245	540	3.7
StableDiffusion(VAE_decoder)	25,417	1,140	4.48
StableDiffusion(Text_encoder)	215	5	2.5
StableDiffusion(UNet)	36,135	2,232	6.2
GPT2	40	2	6.9
BERT	2,170	27	1.25

code example: benchmark/compression.py

Weight Compression

A fp32 model with 7B parameters will take 28GB disk space and memory space. You can not even run the model if your device doesn't have that much memory space. So weight compression is critical to run large language models. As a reference, 7B model with int4 symmetric per block(32) quantization(llama.cpp's q4_0 quantization method) only has ~0.156x model size compared with fp32 model.

Current support:

[fp16]
[int8]x[symmetric/asymmetric]x[per tensor/per channel/per block]
[int4]x[symmetric/asymmetric]x[per tensor/per channel/per block]

code examples:benchmark/examples.py.

How to install

pip install onnx-tool

pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git

python>=3.6

If pip install onnx-tool failed by onnx's installation, you may try pip install onnx==1.8.1 (a lower version like this) first.
Then pip install onnx-tool again.

Known Issues

Loop op is not supported
Sequence type is not supported

Results of ONNX Model Zoo and SOTA models

Some models have dynamic input shapes. The MACs varies from input shapes. The input shapes used in these results are writen to data/public/config.py. These onnx models with all tensors' shape can be downloaded: baidu drive(code: p91k) google drive

Model	Params(M)	MACs(M)
GPT-J 1 layer	464	173,398
MPT 1 layer	261	79,894
text_encoder	123.13	6,782
UNet2DCondition	859.52	888,870
VAE_encoder	34.16	566,371
VAE_decoder	49.49	1,271,959
SqueezeNet 1.0	1.23	351
AlexNet	60.96	665
GoogleNet	6.99	1,606
googlenet_age	5.98	1,605
LResNet100E-IR	65.22	12,102
BERT-Squad	113.61	22,767
BiDAF	18.08	9.87
EfficientNet-Lite4	12.96	1,361
Emotion	12.95	877
Mask R-CNN	46.77	92,077

Model	Params(M)	MACs(M)
LLaMa 1 layer	618	211,801
BEVFormer Tiny	33.7	210,838
rvm_mobilenetv3	3.73	4,289
yolov4	64.33	3,319
ConvNeXt-L	229.79	34,872
edgenext_small	5.58	1,357
SSD	19.98	216,598
RealESRGAN	16.69	73,551
ShuffleNet	2.29	146
GPT-2	137.02	1,103
T5-encoder	109.62	686
T5-decoder	162.62	1,113
RoBERTa-BASE	124.64	688
Faster R-CNN	44.10	46,018
FCN ResNet-50	35.29	37,056
ResNet50	25	3,868

FAQs

What is onnx-tool?

Is onnx-tool well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

onnx-tool

onnx-tool

Basic Parse and Edit

Shape Inference & Profile Model

Compute Graph with Shape Engine

Memory Compression

Activation Compression

Weight Compression

How to install

Known Issues

Results of ONNX Model Zoo and SOTA models

Related posts

ENISA’s 2025 Threat Landscape: AI Reshapes Cyber Attacks, from Phishing to Supply Chain Abuse

Vite+ Joins the Push to Consolidate JavaScript Tooling