
Research
/Security News
Contagious Interview Campaign Escalates With 67 Malicious npm Packages and New Malware Loader
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
A tool for ONNX model:
Supported Models:
You can load any onnx file by onnx_tool.Model:
Change graph structure with onnx_tool.Graph;
Change op attributes and IO tensors with onnx_tool.Node;
Change tensor data or type with onnx_tool.Tensor.
To apply your changes, just call save_model method of onnx_tool.Model or onnx_tool.Graph.
Please refer benchmark/examples.py.
All profiling data must be built on shape inference result.
ONNX graph with tensor shapes:
Introduction: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow
usage: data/TensorflowUsage.md.
examples: benchmark/examples.py.
From a raw graph to a compute graph:
Remove shape calculation layers(created by ONNX export) to get a Compute Graph. Use Shape Engine to update tensor
shapes at runtime.
Examples: benchmark/shape_regress.py.
benchmark/examples.py.
Integrate Compute Graph and Shape Engine into a cpp inference
engine: data/inference_engine.md
Activation memory also called temporary memory is created by each OP's output. Only the last activation marked as the model's output will be kept. So you don't have to prepare memory space for each activation tensor. They better reuse an optimized memory size.
For large language models and high-resolution CV models, the activation memory compression is a key to save memory.
The compression method achieves 5% memory compression on most models.
For example:
model | Native Memory Size(MB) | Compressed Memory Size(MB) | Compression Ratio(%) |
---|---|---|---|
StableDiffusion(VAE_encoder) | 14,245 | 540 | 3.7 |
StableDiffusion(VAE_decoder) | 25,417 | 1,140 | 4.48 |
StableDiffusion(Text_encoder) | 215 | 5 | 2.5 |
StableDiffusion(UNet) | 36,135 | 2,232 | 6.2 |
GPT2 | 40 | 2 | 6.9 |
BERT | 2,170 | 27 | 1.25 |
code example: benchmark/compression.py
A fp32 model with 7B parameters will take 28GB disk space and memory space. You can not even run the model if your device doesn't have that much memory space. So weight compression is critical to run large language models. As a reference, 7B model with int4 symmetric per block(32) quantization(llama.cpp's q4_0 quantization method) only has ~0.156x model size compared with fp32 model.
Current support:
code examples:benchmark/examples.py.
pip install onnx-tool
OR
pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git
python>=3.6
If pip install onnx-tool
failed by onnx's installation, you may try pip install onnx==1.8.1
(a lower version like this) first.
Then pip install onnx-tool
again.
Some models have dynamic input shapes. The MACs varies from input shapes. The input shapes used in these results are writen to data/public/config.py. These onnx models with all tensors' shape can be downloaded: baidu drive(code: p91k) google drive
|
|
FAQs
A tool for ONNX model: A parser, editor and profiler tool for ONNX models.
We found that onnx-tool demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.