Quant2Int8MkldnnPass¶
- class paddle.fluid.contrib.slim.quantization.quant2_int8_mkldnn_pass. Quant2Int8MkldnnPass ( _ops_to_quantize, _op_ids_to_skip=None, _scope=None, _place=None, _core=None, _debug=False ) [source]
-
Transform a quant model IrGraph into MKL-DNN supported INT8 IrGraph. The pass consists of the following transformations:
gather scale values from fake quantize/dequantize operators,
-
- extract FP32 inference model graph from the quant graph, i.e.
-
remove fake quantize/dequantize operators,
dequantize conv2d and mul’s weights,
-
- optimize the FP32 graph using standard FP32 optimization fuses
-
(e.g. conv2d`+`bn -> conv2d),
-
- quantize the optimized FP32 graph using standard INT8v2 quantization
-
passes (cpu_quantize_pass, cpu_quantize_squash_pass).