scs4onnx
A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible. Simple Constant value Shrink for ONNX.
https://github.com/PINTO0309/simple-onnx-processing-tools
Key concept
1. Setup
1-1. HostPC
$ echo export PATH="~/.local/bin:$PATH" >> ~/.bashrc \
&& source ~/.bashrc
$ pip install -U onnx \
&& python3 -m pip install -U onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com \
&& pip install -U scs4onnx
1-2. Docker
https://github.com/PINTO0309/simple-onnx-processing-tools#docker
2. CLI Usage
$ scs4onnx -h
usage:
scs4onnx [-h]
[-m {shrink,npy}]
[-fo FORCED_EXTRACTION_OP_NAMES]
[-fc FORCED_EXTRACTION_CONSTANT_NAMES]
[-d]
[-n]
input_onnx_file_path output_onnx_file_path
positional arguments:
input_onnx_file_path
Input onnx file path.
output_onnx_file_path
Output onnx file path.
optional arguments:
-h, --help
show this help message and exit
-m {shrink,npy}, --mode {shrink,npy}
Constant Value Compression Mode.
shrink: Share constant values inside the model as much as possible.
The model size is slightly larger because
some shared constant values remain inside the model,
but performance is maximized.
npy: Outputs constant values used repeatedly in the model to an
external file .npy. Instead of the smallest model body size,
the file loading overhead is greater.
Default: shrink
-fo FORCED_EXTRACTION_OP_NAMES [FORCED_EXTRACTION_OP_NAMES ...], --forced_extraction_op_names FORCED_EXTRACTION_OP_NAMES [FORCED_EXTRACTION_OP_NAMES ...]
Extracts the constant value of the specified OP name to .npy
regardless of the mode specified.
Cannot be used with --forced_extraction_constant_names at the same time.
e.g. --forced_extraction_op_names aaa bbb ccc
-fc FORCED_EXTRACTION_CONSTANT_NAMES [FORCED_EXTRACTION_CONSTANT_NAMES ...], --forced_extraction_constant_names FORCED_EXTRACTION_CONSTANT_NAMES [FORCED_EXTRACTION_CONSTANT_NAMES ...]
Extracts the constant value of the specified Constant name to .npy
regardless of the mode specified.
Cannot be used with --forced_extraction_op_names at the same time.
e.g. --forced_extraction_constant_names aaa bbb ccc
-d, --disable_auto_downcast
Disables automatic downcast processing from Float64 to Float32 and INT64
to INT32. Try enabling it and re-running it if you encounter type-related
errors.
-n, --non_verbose
Do not show all information logs. Only error logs are displayed.
3. In-script Usage
$ python
>>> from scs4onnx import shrinking
>>> help(shrinking)
Help on function shrinking in module scs4onnx.onnx_shrink_constant:
shrinking(
input_onnx_file_path: Union[str, NoneType] = '',
output_onnx_file_path: Union[str, NoneType] = '',
onnx_graph: Union[onnx.onnx_ml_pb2.ModelProto, NoneType] = None,
mode: Union[str, NoneType] = 'shrink',
forced_extraction_op_names: List[str] = [],
forced_extraction_constant_names: List[str] = [],
disable_auto_downcast: Union[bool, NoneType] = False
non_verbose: Union[bool, NoneType] = False
) -> Tuple[onnx.onnx_ml_pb2.ModelProto, str]
Parameters
----------
input_onnx_file_path: Optional[str]
Input onnx file path.
Either input_onnx_file_path or onnx_graph must be specified.
output_onnx_file_path: Optional[str]
Output onnx file path.
If output_onnx_file_path is not specified, no .onnx file is output.
onnx_graph: Optional[onnx.ModelProto]
onnx.ModelProto.
Either input_onnx_file_path or onnx_graph must be specified.
onnx_graph If specified, ignore input_onnx_file_path and process onnx_graph.
mode: Optional[str]
Constant Value Compression Mode.
'shrink': Share constant values inside the model as much as possible.
The model size is slightly larger because some shared constant values remain
inside the model, but performance is maximized.
'npy': Outputs constant values used repeatedly in the model to an external file .npy.
Instead of the smallest model body size, the file loading overhead is greater.
Default: shrink
forced_extraction_op_names: List[str]
Extracts the constant value of the specified OP name to .npy
regardless of the mode specified.
Cannot be used with --forced_extraction_constant_names at the same time.
e.g. ['aaa','bbb','ccc']
forced_extraction_constant_names: List[str]
Extracts the constant value of the specified Constant name to .npy
regardless of the mode specified.
Cannot be used with --forced_extraction_op_names at the same time.
e.g. ['aaa','bbb','ccc']
disable_auto_downcast: Optional[bool]
Disables automatic downcast processing from Float64 to Float32 and INT64 to INT32.
Try enabling it and re-running it if you encounter type-related errors.
Default: False
non_verbose: Optional[bool]
Do not show all information logs. Only error logs are displayed.
Default: False
Returns
-------
shrunken_graph: onnx.ModelProto
Shrunken onnx ModelProto
npy_file_paths: List[str]
List of paths to externally output .npy files.
An empty list is always returned when in 'shrink' mode.
3. CLI Execution
$ scs4onnx input.onnx output.onnx --mode shrink
4. In-script Execution
4-1. When an onnx file is used as input
If output_onnx_file_path
is not specified, no .onnx file is output.
from scs4onnx import shrinking
shrunk_graph, npy_file_paths = shrinking(
input_onnx_file_path='input.onnx',
output_onnx_file_path='output.onnx',
mode='npy',
non_verbose=False
)
4-2. When entering the onnx.ModelProto
onnx_graph
If specified, ignore input_onnx_file_path
and process onnx_graph
.
from scs4onnx import shrinking
shrunk_graph, npy_file_paths = shrinking(
onnx_graph=graph,
mode='npy',
non_verbose=True
)
5. Sample
5-1. shrink
mode sample
-
297.8MB -> 67.4MB (.onnx)
$ scs4onnx gmflow_sintel_480x640.onnx gmflow_sintel_480x640_opt.onnx
-
1.8GB -> 886.8MB (.onnx)
$ scs4onnx hitnet_sf_finalpass_720x960.onnx hitnet_sf_finalpass_720x960_opt.onnx
-
1.8GB -> 2.1MB (.onnx) + 884.7MB (.npy)
$ scs4onnx \
hitnet_sf_finalpass_720x960.onnx \
hitnet_sf_finalpass_720x960_opt.onnx \
--forced_extraction_op_names GatherElements_660
-
297.8MB -> 21.3MB (.onnx) + 46.1MB (.npy)
$ scs4onnx \
gmflow_sintel_480x640.onnx \
gmflow_sintel_480x640_opt.onnx \
--forced_extraction_constant_names 1646
5-2. npy
mode sample
5-3. .npy
file view
$ python
>>> import numpy as np
>>> param = np.load('gmflow_sintel_480x640_shrunken_exported_1646.npy')
>>> param.shape
(8, 1200, 1200)
>>> param
array([[[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[-100., -100., -100., ..., 0., 0., 0.],
[-100., -100., -100., ..., 0., 0., 0.],
[-100., -100., -100., ..., 0., 0., 0.]]], dtype=float32)
6. Sample ONNX models
- gmflow_sintel_480x640.onnx - Optical flow calculation - LICENSE Apache License 2.0
- hitnet_sf_finalpass_720x960.onnx - Stereo depth estimation - LICENSE Apache License 2.0
7. Reference
- https://docs.nvidia.com/deeplearning/tensorrt/onnx-graphsurgeon/docs/index.html
- https://github.com/NVIDIA/TensorRT/tree/main/tools/onnx-graphsurgeon
- https://github.com/PINTO0309/sne4onnx
- https://github.com/PINTO0309/snd4onnx
- https://github.com/PINTO0309/snc4onnx
- https://github.com/PINTO0309/sog4onnx
- https://github.com/PINTO0309/PINTO_model_zoo
8. Issues
https://github.com/PINTO0309/simple-onnx-processing-tools/issues