Launch Week Day 4: Introducing Data Exports.Learn More
Socket
Book a DemoSign in
Socket

onnx-tool

Package Overview
Dependencies
Maintainers
1
Versions
52
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

onnx-tool

A tool for parsing, editing, optimizing, and profiling ONNX models.

pipPyPI
Version
1.0.1
Maintainers
1

📄 简体中文 | ✨ New Project: AI-Enhancement-Filter (powered by onnx-tool)

Python 3.6+ PyPI Version License

onnx-tool

A comprehensive toolkit for analyzing, optimizing, and transforming ONNX models with advanced capabilities for LLMs, diffusion models, and computer vision architectures.

  • LLM Optimization: Build and profile large language models with KV cache analysis (example)
  • Graph Transformation:
    • Constant folding (docs)
    • Operator fusion (docs)
  • Advanced Profiling:
    • Rapid shape inference
    • MACs/parameter statistics with sparsity awareness
  • Compute Graph Engine: Runtime shape computation with minimal overhead (details)
  • Memory Compression:
    • Activation memory optimization (up to 95% reduction)
    • Weight quantization (FP16, INT8/INT4 with per-tensor/channel/block schemes)
  • Quantization & Sparsity: Full support for quantized and sparse model analysis

🤖 Supported Model Architectures

DomainModels
NLPBERT, T5, GPT, LLaMa, MPT (TransformerModel)
DiffusionStable Diffusion (TextEncoder, VAE, UNet)
CVDetic, BEVFormer, SSD300_VGG16, ConvNeXt, Mask R-CNN, Silero VAD
AudioSovits, LPCNet

⚡ Build & Profile LLMs in Seconds

Profile 10 Hugging Face models in under one second. Export ONNX models with llama.cpp-like simplicity (code).

Model Statistics (1k token input)

model name(1k input)MACs(G)Parameters(G)KV Cache(G)
gpt-j-6b62776.050490.234881
yi-1.5-34B3586234.38890.125829
microsoft/phi-229482.779440.167772
Phi-3-mini-4k40833.821080.201327
Phi-3-small-8k-instruct79127.801670.0671089
Phi-3-medium-4k-instruct1466513.96020.104858
Llama3-8B80298.030260.0671089
Llama-3.1-70B-Japanese-Instruct-24077288870.55370.167772
QWen-7B75097.615620.0293601
Qwen2_72B_Instruct7489572.70620.167772

Latency Estimation (4-bit weights, 16-bit KV cache)

model_type_4bit_kv16bitmemory_size(GB)Ultra-155H_TTFTUltra-155H_TPOTArc-A770_TTFTArc-A770_TPOTH100-PCIe_TTFTH100-PCIe_TPOT
gpt-j-6b3.756781.09470.0417420.09168820.006708530.01640150.00187839
yi-1.5-34B19.33695.770950.2148540.453440.03453020.07478540.00966844
microsoft/phi-21.824850.583610.02027610.05296280.003258660.0103380.000912425
Phi-3-mini-4k2.496490.8111730.02773880.07453560.004458020.01472740.00124825
Phi-3-small-8k-instruct4.29131.389850.04768110.1175120.007663030.02125350.00214565
Phi-3-medium-4k-instruct7.969772.44630.0885530.1982490.01423170.03405760.00398489
Llama3-8B4.355591.43540.04839540.1233330.007777840.02271820.00217779
Llama-3.1-70B-Japanese-Instruct-240739.430311.35410.4381140.8684750.07041120.1379010.0197151
QWen-7B4.035761.349830.04484170.117220.007206710.02184610.00201788
Qwen2_72B_Instruct40.530911.65340.4503430.8908160.07237660.141320.0202654

💡 Latencies computed from hardware specs – no actual inference required

🔧 Basic Parsing & Editing

Intuitive API for model manipulation:

from onnx_tool import Model

model = Model('model.onnx')          # Load any ONNX file
graph = model.graph                  # Access computation graph
node = graph.nodemap['Conv_0']       # Modify operator attributes
tensor = graph.tensormap['weight']   # Edit tensor data/types
model.save_model('modified.onnx')    # Persist changes

See comprehensive examples in benchmark/examples.py.

📊 Shape Inference & Profiling

All profiling relies on precise shape inference:

Shape inference visualization

Profiling Capabilities

  • Standard profiling: MACs, parameters, memory footprint
  • Sparse-aware profiling: Quantify sparsity impact on compute

MACs profiling table Sparse model profiling

📚 Learn more:

⚙️ Compute Graph & Shape Engine

Transform exported ONNX graphs into efficient Compute Graphs by removing shape-calculation overhead:

Compute graph transformation

  • Compute Graph: Minimal graph containing only compute operations
  • Shape Engine: Runtime shape resolver for dynamic models

Use Cases:

  • Integration with custom inference engines (guide)
  • Shape regression testing (example)

💾 Memory Compression

Activation Memory Compression

Reuses temporary buffers to minimize peak memory usage – critical for LLMs and high-res CV models.

modelNative Memory Size(MB)Compressed Memory Size(MB)Compression Ratio(%)
StableDiffusion(VAE_encoder)14,2455403.7
StableDiffusion(VAE_decoder)25,4171,1404.48
StableDiffusion(Text_encoder)21552.5
StableDiffusion(UNet)36,1352,2326.2
GPT24026.9
BERT2,170271.25

✅ Typical models achieve >90% activation memory reduction
📌 Implementation: benchmark/compression.py

Weight Compression

Essential for deploying large models on memory-constrained devices:

Quantization SchemeSize vs FP32Example (7B model)
FP32 (baseline)1.00×28 GB
FP160.50×14 GB
INT8 (per-channel)0.25×7 GB
INT4 (block=32, symmetric) – llama.cpp0.156×4.4 GB

Supported schemes:

  • ✅ FP16
  • ✅ INT8: symmetric/asymmetric × per-tensor/channel/block
  • ✅ INT4: symmetric/asymmetric × per-tensor/channel/block

📌 See benchmark/examples.py for implementation examples.

🚀 Installation

# PyPI (recommended)
pip install onnx-tool

# Latest development version
pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git

Requirements: Python ≥ 3.6

⚠️ Troubleshooting: If ONNX installation fails, try:

pip install onnx==1.8.1 && pip install onnx-tool

Known Issues

  • Loop op is not supported
  • Sequence type is not supported

📈 Model Zoo Results

Comprehensive profiling of ONNX Model Zoo and SOTA models. Input shapes defined in data/public/config.py.

📥 Download pre-profiled models (with full tensor shapes):

ModelParams(M)MACs(M)
GPT-J 1 layer464173,398
MPT 1 layer26179,894
text_encoder123.136,782
UNet2DCondition859.52888,870
VAE_encoder34.16566,371
VAE_decoder49.491,271,959
SqueezeNet 1.01.23351
AlexNet60.96665
GoogleNet6.991,606
googlenet_age5.981,605
LResNet100E-IR65.2212,102
BERT-Squad113.6122,767
BiDAF18.089.87
EfficientNet-Lite412.961,361
Emotion12.95877
Mask R-CNN46.7792,077
ModelParams(M)MACs(M)
LLaMa 1 layer618211,801
BEVFormer Tiny33.7210,838
rvm_mobilenetv33.734,289
yolov464.333,319
ConvNeXt-L229.7934,872
edgenext_small5.581,357
SSD19.98216,598
RealESRGAN16.6973,551
ShuffleNet2.29146
GPT-2137.021,103
T5-encoder109.62686
T5-decoder162.621,113
RoBERTa-BASE124.64688
Faster R-CNN44.1046,018
FCN ResNet-5035.2937,056
ResNet50253,868

🤝 Contributing

Contributions are welcome! Please open an issue or PR for:

  • Bug reports
  • Feature requests
  • Documentation improvements
  • New model support

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts