QuantInt8MkldnnPass

class paddle.fluid.contrib.slim.quantization.quant_int8_mkldnn_pass. QuantInt8MkldnnPass ( _scope=None, _place=None ) [source]

Convert QuantizationFreezePass generated IrGraph to MKL-DNN supported INT8 IrGraph. Following transformations did in this pass:

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/contrib/slim/quantization/quant_int8_mkldnn_pass.py:docstring of paddle.fluid.contrib.slim.quantization.quant_int8_mkldnn_pass.QuantInt8MkldnnPass, line 3)

Unexpected indentation.

  1. Convert int8 range weights with float32 data type, which are generated by the QuantizationFreezePass, to float32 range weights with float32 data type by using the corresponding scales. This conversion is because MKL-DNN INT8 conv2d kernel and mul kernel now only support float32 weights input, hence weights quantization will happen inside the conv2d and mul INT8 kernel.

  2. Create the new conv2d or mul op with the converted weights and link its output to fake_dequantize_abs_max op’s output and set conv2d’s attribute “force_fp32 _output” as true

  3. Transform fake_quantize_xx op to quantize op

  4. Remove fake_dequantize_abs_max op

apply ( graph )

apply

Quantize the graph for running MKL-DNN INT8 inference. According to activation quantization type, the graph will transform fake quantize ops to quantize ops and remove the fake dequantize ops.

Parameters

graph (IrGraph) – the applied graph.