QuantInt8MkldnnPass¶
- class paddle.fluid.contrib.slim.quantization.quant_int8_mkldnn_pass. QuantInt8MkldnnPass ( _scope=None, _place=None ) [source]
-
Convert QuantizationFreezePass generated IrGraph to MKL-DNN supported INT8 IrGraph. Following transformations did in this pass:
Convert int8 range weights with float32 data type, which are generated by the QuantizationFreezePass, to float32 range weights with float32 data type by using the corresponding scales. This conversion is because MKL-DNN INT8 conv2d kernel and mul kernel now only support float32 weights input, hence weights quantization will happen inside the conv2d and mul INT8 kernel.
Create the new conv2d or mul op with the converted weights and link its output to fake_dequantize_abs_max op’s output and set conv2d’s attribute “force_fp32 _output” as true
Transform fake_quantize_xx op to quantize op
Remove fake_dequantize_abs_max op
-
apply
(
graph
)
apply¶
-
Quantize the graph for running MKL-DNN INT8 inference. According to activation quantization type, the graph will transform fake quantize ops to quantize ops and remove the fake dequantize ops.
- Parameters
-
graph (IrGraph) – the applied graph.