>>> from onnx2tf import convert
>>> help(convert)
Help on function convert in module onnx2tf:
convert(
input_onnx_file_path: Union[str, NoneType] = '',
onnx_graph: Union[onnx.onnx_ml_pb2.ModelProto, NoneType] = None,
output_folder_path: Union[str, NoneType] = 'saved_model',
output_signaturedefs: Optional[bool] = False,
output_h5: Optional[bool] = False,
output_keras_v3: Optional[bool] = False,
output_tfv1_pb: Optional[bool] = False,
output_weights: Optional[bool] = False,
copy_onnx_input_output_names_to_tflite: Optional[bool] = False,
output_integer_quantized_tflite: Optional[bool] = False,
quant_type: Optional[str] = 'per-channel',
custom_input_op_name_np_data_path: Optional[List] = None,
input_output_quant_dtype: Optional[str] = 'int8',
not_use_onnxsim: Optional[bool] = False,
not_use_opname_auto_generate: Optional[bool] = False,
batch_size: Union[int, NoneType] = None,
overwrite_input_shape: Union[List[str], NoneType] = None,
no_large_tensor: Optional[bool] = False,
output_nms_with_dynamic_tensor: Optional[bool] = False,
keep_ncw_or_nchw_or_ncdhw_input_names: Union[List[str], NoneType] = None,
keep_nwc_or_nhwc_or_ndhwc_input_names: Union[List[str], NoneType] = None,
keep_shape_absolutely_input_names: Optional[List[str]] = None,
input_names_to_interrupt_model_conversion: Union[List[str], NoneType] = None,
output_names_to_interrupt_model_conversion: Union[List[str], NoneType] = None,
disable_group_convolution: Union[bool, NoneType] = False,
enable_batchmatmul_unfold: Optional[bool] = False,
enable_rnn_unroll: Optional[bool] = False,
disable_suppression_flextranspose: Optional[bool] = False,
number_of_dimensions_after_flextranspose_compression: Optional[int] = 6,
disable_suppression_flexstridedslice: Optional[bool] = False,
disable_strict_mode: Optional[bool] = False,
number_of_dimensions_after_flexstridedslice_compression: Optional[int] = 5,
optimization_for_gpu_delegate: Optional[bool] = False,
replace_argmax_to_reducemax_and_indices_is_int64: Union[bool, NoneType] = False,
replace_argmax_to_reducemax_and_indices_is_float32: Union[bool, NoneType] = False,
replace_argmax_to_fused_argmax_and_indices_is_int64: Union[bool, NoneType] = False,
replace_argmax_to_fused_argmax_and_indices_is_float32: Union[bool, NoneType] = False,
fused_argmax_scale_ratio: Union[float, NoneType] = 0.5,
replace_to_pseudo_operators: List[str] = None,
mvn_epsilon: Union[float, NoneType] = 0.0000000001,
param_replacement_file: Optional[str] = '',
check_gpu_delegate_compatibility: Optional[bool] = False,
check_onnx_tf_outputs_elementwise_close: Optional[bool] = False,
check_onnx_tf_outputs_elementwise_close_full: Optional[bool] = False,
check_onnx_tf_outputs_sample_data_normalization: Optional[str] = 'norm',
check_onnx_tf_outputs_elementwise_close_rtol: Optional[float] = 0.0,
check_onnx_tf_outputs_elementwise_close_atol: Optional[float] = 1e-4,
disable_model_save: Union[bool, NoneType] = False,
non_verbose: Union[bool, NoneType] = False,
verbosity: Optional[str] = 'debug'
) -> keras.engine.training.Model
Convert ONNX to TensorFlow models.
Parameters
----------
input_onnx_file_path: Optional[str]
Input onnx file path.
Either input_onnx_file_path or onnx_graph must be specified.
onnx_graph: Optional[onnx.ModelProto]
onnx.ModelProto.
Either input_onnx_file_path or onnx_graph must be specified.
onnx_graph If specified, ignore input_onnx_file_path and process onnx_graph.
output_folder_path: Optional[str]
Output tensorflow model folder path.
Default: "saved_model"
output_signaturedefs: Optional[bool]
Signature is added to the output for serving or for conversion
to other model formats. However, this can significantly reduce the speed
of model conversion and significant increase the size of the model.
output_h5: Optional[bool]
Output model in Keras H5 format.
output_keras_v3: Optional[bool]
Output model in Keras (keras_v3) format.
output_tfv1_pb: Optional[bool]
Output model in TF v1 (.pb) format.
output_weights: Optional[bool]
Output weights in hdf5 format.
copy_onnx_input_output_names_to_tflite: Optional[bool]
Copy the input/output OP name of ONNX to the input/output OP name of tflite.
Due to Tensorflow internal operating specifications,
the input/output order of ONNX does not necessarily match
the input/output order of tflite.
Be sure to check that the input/output OP names in the generated
tflite file have been converted as expected.
Also, this option generates a huge JSON file as a temporary file for processing.
Therefore, it is strongly discouraged to use it on large models of hundreds
of megabytes or more.
output_integer_quantized_tflite: Optional[bool]
Output of integer quantized tflite.
quant_type: Optional[str]
Selects whether "per-channel" or "per-tensor" quantization is used.
Default: "per-channel"
custom_input_op_name_np_data_path: Optional[List]
--custom_input_op_name_np_data_path INPUT_NAME NUMPY_FILE_PATH MEAN STD
Input name of OP and path of data file (Numpy) for custom input for -cotof or -oiqt,
and mean (optional) and std (optional).
<Usage in -cotof>
When using -cotof, custom input defined by the user, instead of dummy data, is used.
In this case, mean and std are omitted from the input.
-cind {input_op_name} {numpy_file_path}
e.g. -cind onnx::Equal_0 test_cind/x_1.npy -cind onnx::Add_1 test_cind/x_2.npy -cotof
The input_op_name must be the same as in ONNX,
and it may not work if the input format is different between ONNX and TF.
<Usage in -oiqt>
INPUT Name of OP and path of calibration data file (Numpy) for quantization
and mean and std.
The specification can be omitted only when the input OP is a single 4D tensor image data.
If omitted, it is automatically calibrated using 20 normalized MS-COCO images.
The type of the input OP must be Float32.
Data for calibration must be pre-normalized to a range of 0 to 1.
-cind {input_op_name} {numpy_file_path} {mean} {std}
Numpy file paths must be specified the same number of times as the number of input OPs.
Normalize the value of the input OP based on the tensor specified in mean and std.
(input_value - mean) / std
Tensors in Numpy file format must be in dimension order after conversion to TF.
Note that this is intended for deployment on low-resource devices,
so the batch size is limited to 1 only.
e.g.
The example below shows a case where there are three input OPs.
Assume input0 is 128x128 RGB image data.
In addition, input0 should be a value that has been divided by 255
in the preprocessing and normalized to a range between 0 and 1.
input1 and input2 assume the input of something that is not an image.
Because input1 and input2 assume something that is not an image,
the divisor is not 255 when normalizing from 0 to 1.
"n" is the number of calibration data.
ONNX INPUT shapes:
input0: [n,3,128,128]
mean: [1,3,1,1] -> [[[[0.485]],[[0.456]],[[0.406]]]]
std : [1,3,1,1] -> [[[[0.229]],[[0.224]],[[0.225]]]]
input1: [n,64,64]
mean: [1,64] -> [[0.1, ..., 0.64]]
std : [1,64] -> [[0.05, ..., 0.08]]
input2: [n,5]
mean: [1] -> [0.3]
std : [1] -> [0.07]
TensorFlow INPUT shapes (Numpy file ndarray shapes):
input0: [n,128,128,3]
mean: [1,1,1,3] -> [[[[0.485, 0.456, 0.406]]]]
std : [1,1,1,3] -> [[[[0.229, 0.224, 0.225]]]]
input1: [n,64,64]
mean: [1,64] -> [[0.1, ..., 0.64]]
std : [1,64] -> [[0.05, ..., 0.08]]
input2: [n,5]
mean: [1] -> [0.3]
std : [1] -> [0.07]
cind=[
["input0","../input0.npy",[[[[0.485, 0.456, 0.406]]]],[[[[0.229, 0.224, 0.225]]]]],
["input1","./input1.npy",[0.1, ..., 0.64],[0.05, ..., 0.08]],
["input2","input2.npy",[0.3],[0.07]],
]
<Using -cotof and -oiqt at the same time>
To use -cotof and -oiqt simultaneously,
you need to enter the Input name of OP, path of data file, mean, and std all together.
And the data file must be in Float32 format,
and {input_op_name}, {numpy_file_path}, {mean}, and {std} must all be entered.
Otherwise, an error will occur during the -oiqt stage.
input_output_quant_dtype: Optional[str]
Input and Output dtypes when doing Full INT8 Quantization.
"int8"(default) or "uint8"
not_use_onnxsim: Optional[bool]
No optimization by onnx-simplifier is performed.
If this option is used, the probability of a conversion error is very high.
not_use_opname_auto_generate: Optional[bool]
Automatic generation of each OP name in the old format ONNX file
and assignment of OP name are not performed.
batch_size: Optional[int]
Fixes the dynamic batch size to the specified numeric batch size.
A value of 1 or more must be specified.
overwrite_input_shape: Optional[List[str]]
Overwrite the input shape.
The format is
['i1:dim0,dim1,...,dimN', 'i2:dim0,dim1,...,dimN', 'i3:dim0,dim1,...,dimN']
When there is only one input, for example,
['data:1,3,224,224']
When there are multiple inputs, for example,
['data1:1,3,224,224','data2:1,3,112','data3:5']
A value of 1 or more must be specified.
Numerical values other than dynamic dimensions are ignored.
Ignores batch_size if specified at the same time as batch_size.
no_large_tensor: Optional[bool]
Suppresses constant bloat caused by Tile OP when optimizing models in onnxsim.
See: https://github.com/daquexian/onnx-simplifier/issues/178
output_nms_with_dynamic_tensor: Optional[bool]
The number of bounding boxes in the NMS output results is
not fixed at the maximum number of max_output_boxes_per_class,
but rather at the smallest possible number of dynamic tensors.
If this option is disabled, NMS output is padded to the number
set in the max_output_boxes_per_class attribute.
e.g.
disable --output_nms_with_dynamic_tensor:
output_tensor_shape: [100, 7]
enable --output_nms_with_dynamic_tensor:
output_tensor_shape: [N, 7]
keep_ncw_or_nchw_or_ncdhw_input_names: Optional[List[str]]
Holds the NCW or NCHW or NCDHW of the input shape for the specified INPUT OP names.
If a nonexistent INPUT OP name is specified, it is ignored.
Valid only for 3D, 4D and 5D input tensors.
e.g.
keep_ncw_or_nchw_or_ncdhw_input_names=['input0','input1','input2']
keep_nwc_or_nhwc_or_ndhwc_input_names: Optional[List[str]]
Holds the NWC or NHWC or NDHWC of the input shape for the specified INPUT OP names.
If a nonexistent INPUT OP name is specified, it is ignored.
If the input OP name is the same as the input OP name specified
in the keep_ncw_or_nchw_or_ncdhw_input_names option, it is ignored.
Valid only for 3D, 4D and 5D input tensors.
e.g.
keep_nwc_or_nhwc_or_ndhwc_input_names=['input0','input1','input2']
keep_shape_absolutely_input_names: Optional[List[str]]
Name of the INPUT that unconditionally maintains its shape.
If a nonexistent INPUT OP name is specified, it is ignored.
e.g.
keep_shape_absolutely_input_names=['input0','input1','input2']
input_names_to_interrupt_model_conversion: Optional[List[str]]
Input names of ONNX that interrupt model conversion.
Interrupts model transformation at the specified input name
and inputs the model partitioned into subgraphs.
e.g.
input_names_to_interrupt_model_conversion=['input0','input1','input2']
output_names_to_interrupt_model_conversion: Optional[List[str]]
Output names of ONNX that interrupt model conversion.
Interrupts model transformation at the specified output name
and outputs the model partitioned into subgraphs.
e.g.
output_names_to_interrupt_model_conversion=['output0','output1','output2']
disable_group_convolution: Optional[bool]
Disable GroupConvolution and replace it with SeparableConvolution for
output to saved_model format.
enable_accumulation_type_float16: Optional[bool]
Hint for XNNPack fp16 inference on float16 tflite model.
XNNPACK float16 inference on certain ARM64 cores is 2x faster.
Float16 inference doubling on devices with ARM64 ARMv8.2 or higher instruction set.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md
enable_batchmatmul_unfold: Optional[bool]
BatchMatMul is separated batch by batch to generate a primitive MatMul.
enable_rnn_unroll: Optional[bool]
Instead of increasing inference speed by expanding all symbolic loops of
the RNN (LSTM, GRU, RNN), RAM consumption will increase because all tensors
are expanded and embedded in the model.
https://keras.io/api/layers/recurrent_layers/
disable_suppression_flextranspose: Optional[bool]
Disables FlexTranspose generation suppression.
number_of_dimensions_after_flextranspose_compression: Optional[int]
Number of Transpose OP dimensions generated after avoiding FlexTranspose generation.
Also suppress the creation of the Transpose itself by specifying 2.
Default: 6
disable_suppression_flexstridedslice: Optional[bool]
Disables FlexStridedSlice generation suppression.
disable_strict_mode: Optional[bool]
If specified, the conversion speed is greatly accelerated because the strict accuracy
correction process is skipped, but the frequency of transposition errors increases
and accuracy errors are more likely to occur. Strict mode is enabled by default.
As of 2023.05.07, this is a work in progress and is an experimental feature.
Therefore, only some OPs are converted in strict mode for accuracy correction.
number_of_dimensions_after_flexstridedslice_compression: Optional[int]
Number of StridedSlice OP dimensions generated after avoiding FlexStridedSlice generation.
Default: 5
optimization_for_gpu_delegate: Optional[bool]
Replace operations that do not support gpu delegate with those
that do as much as possible.
replace_argmax_to_reducemax_and_indices_is_int64: Optional[bool]
Replace ArgMax with a ReduceMax. The returned indices are int64.
Only one of replace_argmax_to_reducemax_and_indices_is_int64 and
replace_argmax_to_reducemax_and_indices_is_float32 and
replace_argmax_to_fused_argmax_and_indices_is_int64 and
replace_argmax_to_fused_argmax_and_indices_is_float32 can be specified.
Default: False
replace_argmax_to_reducemax_and_indices_is_float32: Optional[bool]
Replace ArgMax with a ReduceMax. The returned indices are float32.
Only one of replace_argmax_to_reducemax_and_indices_is_int64 and
replace_argmax_to_reducemax_and_indices_is_float32 and
replace_argmax_to_fused_argmax_and_indices_is_int64 and
replace_argmax_to_fused_argmax_and_indices_is_float32 can be specified.
Default: False
replace_argmax_to_fused_argmax_and_indices_is_int64: Optional[bool]
Replace ArgMax with a ReduceMax. The returned indices are int64.
It improves inference speed at the cost of a small sacrifice in accuracy.
See. https://github.com/tensorflow/models/tree/master/official/projects/edgetpu/vision
Currently, only 4D tensors are supported.
Only one of replace_argmax_to_reducemax_and_indices_is_int64 and
replace_argmax_to_reducemax_and_indices_is_float32 and
replace_argmax_to_fused_argmax_and_indices_is_int64 and
replace_argmax_to_fused_argmax_and_indices_is_float32 can be specified.
Default: False
replace_argmax_to_fused_argmax_and_indices_is_float32: Optional[bool]
Replace ArgMax with a ReduceMax. The returned indices are float32.
It improves inference speed at the cost of a small sacrifice in accuracy.
See. https://github.com/tensorflow/models/tree/master/official/projects/edgetpu/vision
Currently, only 4D tensors are supported.
Only one of replace_argmax_to_reducemax_and_indices_is_int64 and
replace_argmax_to_reducemax_and_indices_is_float32 and
replace_argmax_to_fused_argmax_and_indices_is_int64 and
replace_argmax_to_fused_argmax_and_indices_is_float32 can be specified.
Default: False
fused_argmax_scale_ratio: Optional[float]
For Fused ArgMax.
Scale ratio when generating Fused ArgMax.
0.0 < fused_argmax_scale_ratio <= 1.0
Default: 0.5
replace_to_pseudo_operators: List[str]
Replace list of operators to pseudo operators.
Full name of the target operators should be given.
Currently supported operators :
Asin, Acos, Atan, Abs, PReLU, LeakyReLU, Power, GatherND, Neg, HardSwish, Erf, GeLU, MatMulInteger
mvn_epsilon: Optional[float]
For MeanVarianceNormalization.
The number to be added to the variance to avoid division by zero
when normalizing the value.
(input_tensor - mean) / tf.sqrt(variance + mvn_epsilon)
Default: 0.0000000001
param_replacement_file: Optional[str]
Parameter replacement file path. (.json)
check_gpu_delegate_compatibility: Optional[bool]
Run TFLite ModelAnalyzer on the generated Float16 tflite model
to check if the model can be supported by GPU Delegate.
e.g.
"""
=== TFLite ModelAnalyzer ===
Your TFLite model has '1' subgraph(s). In the subgraph description below,
T# represents the Tensor numbers. For example, in Subgraph#0, the RESHAPE op takes
tensor #0 and tensor #6 as input and produces tensor #7 as output.
Subgraph#0 main(T#0) -> [T#17]
Op#0 RESHAPE(T#0, T#6[2, 8, 8, 3, 2, ...]) -> [T#7]
Op#1 SPLIT(T#5[0], T#7) -> [T#8, T#9]
Op#2 RESHAPE(T#8, T#1[8, 8, 3, 2, 2]) -> [T#10]
Op#3 TRANSPOSE(T#10, T#4[0, 3, 1, 4, 2]) -> [T#11]
Op#4 RESHAPE(T#11, T#2[1, 8, 2, 8, 2, ...]) -> [T#12]
Op#5 RESHAPE(T#9, T#1[8, 8, 3, 2, 2]) -> [T#13]
Op#6 TRANSPOSE(T#13, T#4[0, 3, 1, 4, 2]) -> [T#14]
Op#7 RESHAPE(T#14, T#2[1, 8, 2, 8, 2, ...]) -> [T#15]
Op#8 CONCATENATION(T#12, T#15) -> [T#16]
Op#9 RESHAPE(T#16, T#3[2, 16, 16, 3]) -> [T#17]
Tensors of Subgraph#0
T#0(inputs_0) shape:[2, 8, 8, 12], type:FLOAT32
T#1(model/tf.compat.v1.squeeze_2/Squeeze) shape:[5], type:INT32 RO 20 bytes, data:[8, 8, 3, 2, 2]
T#2(model/tf.expand_dims_1/ExpandDims) shape:[6], type:INT32 RO 24 bytes, data:[1, 8, 2, 8, 2, ...]
T#3(model/tf.reshape_1/Reshape/shape) shape:[4], type:INT32 RO 16 bytes, data:[2, 16, 16, 3]
T#4(model/tf.compat.v1.transpose/transpose/perm) shape:[5], type:INT32 RO 20 bytes, data:[0, 3, 1, 4, 2]
T#5(model/tf.concat/concat/axis) shape:[], type:INT32 RO 4 bytes, data:[0]
T#6(model/tf.reshape/Reshape/shape) shape:[6], type:INT32 RO 24 bytes, data:[2, 8, 8, 3, 2, ...]
T#7(model/tf.reshape/Reshape) shape:[2, 8, 8, 3, 2, 2], type:FLOAT32
T#8(model/tf.split/split) shape:[1, 8, 8, 3, 2, 2], type:FLOAT32
T#9(model/tf.split/split1) shape:[1, 8, 8, 3, 2, 2], type:FLOAT32
T#10(model/tf.compat.v1.squeeze_1/Squeeze) shape:[8, 8, 3, 2, 2], type:FLOAT32
T#11(model/tf.compat.v1.transpose/transpose) shape:[8, 2, 8, 2, 3], type:FLOAT32
T#12(model/tf.expand_dims/ExpandDims) shape:[1, 8, 2, 8, 2, 3], type:FLOAT32
T#13(model/tf.compat.v1.squeeze_2/Squeeze1) shape:[8, 8, 3, 2, 2], type:FLOAT32
T#14(model/tf.compat.v1.transpose_1/transpose) shape:[8, 2, 8, 2, 3], type:FLOAT32
T#15(model/tf.expand_dims_1/ExpandDims1) shape:[1, 8, 2, 8, 2, 3], type:FLOAT32
T#16(model/tf.concat/concat) shape:[2, 8, 2, 8, 2, 3], type:FLOAT32
T#17(Identity) shape:[2, 16, 16, 3], type:FLOAT32
Your model looks compatibile with GPU delegate with TFLite runtime version 2.10.0.
But it doesn't guarantee that your model works well with GPU delegate.
There could be some runtime incompatibililty happen.
---------------------------------------------------------------
Model size: 2988 bytes
Non-data buffer size: 2757 bytes (92.27 %)
Total data buffer size: 231 bytes (07.73 %)
(Zero value buffers): 4 bytes (00.13 %)
* Buffers of TFLite model are mostly used for constant tensors.
And zero value buffers are buffers filled with zeros.
Non-data buffers area are used to store operators, subgraphs and etc.
You can find more details from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs
"""
check_onnx_tf_outputs_elementwise_close: Optional[bool]
Returns "Matches" if the output of onnx and the output of TF are
within acceptable proximity element by element.
Returns "Unmatched" if the output of onnx and the output of TF are
not within acceptable proximity element by element.
If the output of onnx is 1D, it returns "Skipped" and skips the comparison
between the output of onnx and that of TF. This is because when undefined
dimensions are present, a situation often arises where very large index
values are compared, causing OutOfMemory.
Only the output content of the models final output OP is checked.
check_onnx_tf_outputs_elementwise_close_full: Optional[bool]
Returns "Matches" if the output of onnx and the output of TF are
within acceptable proximity element by element.
Check the output of all OPs in sequence from the beginning,
including all but the final output OP of the model.
Returns "Unmatched" if the output of onnx and the output of TF are
not within acceptable proximity element by element.
If the output of onnx is 1D, it returns "Skipped" and skips the comparison
between the output of onnx and that of TF. This is because when undefined
dimensions are present, a situation often arises where very large index
values are compared, causing OutOfMemory.
It is very time consuming because it performs as many inferences as
there are operations.
check_onnx_tf_outputs_sample_data_normalization: Optional[str]
norm: Validate using random data normalized to the range 0.0 to 1.0
denorm: Validate using random data in the range 0.0 to 255.0
If there is a normalization layer at the models entry point, or
if the model was trained on denormalized data, "denorm" must be specified.
Default: "norm"
check_onnx_tf_outputs_elementwise_close_rtol: Optional[float]
The relative tolerance parameter.
Default: 0.0
check_onnx_tf_outputs_elementwise_close_atol: Optional[float]
The absolute tolerance parameter.
Default: 1e-4
disable_model_save: Optional[bool]
Does not save the converted model. For CIs RAM savings.
Default: False
non_verbose: Optional[bool]
Shorthand to specify a verbosity of "error".
Default: False
verbosity: Optional[str]
Change the level of information printed.
Values are "debug", "info", "warn", and "error".
Default: "debug" (for backwards compatability)
Returns
----------
model: tf_keras.Model
Model