WeightQuantization¶
- class paddle.fluid.contrib.slim.quantization.post_training_quantization. WeightQuantization ( model_dir, model_filename=None, params_filename=None ) [source]
-
-
quantize_weight_to_int
(
save_model_dir,
save_model_filename=None,
save_params_filename=None,
quantizable_op_type=['conv2d', 'mul'],
weight_bits=8,
weight_quantize_type='channel_wise_abs_max',
generate_test_model=False,
threshold_rate=0.0
)
quantize_weight_to_int¶
-
In order to reduce the size of model, this api quantizes the weight of some ops from float32 to int8/16. In the inference stage, the quantized weight will be dequantized to float32 again.
- Parameters
-
save_model_dir (str) – The path to save the quantized model.
save_model_filename (str, optional) – The name of file to save the inference program. If it is None, the default filename ‘__model__’ will be used. Default is ‘None’.
save_params_filename (str, optional) – The name of file to save all parameters. If it is None, parameters were saved in separate files. If it is not None, all parameters were saved in a single binary file.
quantizable_op_type (list[str], optional) – The list of ops that will be quantized, and the quantized ops should be contained in [“conv2d”, “depthwise_conv2d”, “mul”]. Default is [“conv2d”,”mul”].
weight_bits (int, optional) – The bits for the quantized weight, and it should be 8 or 16. Default is 8.
weight_quantize_type (str, optional) – quantization type for weights, support ‘channel_wise_abs_max’ and ‘abs_max’. Set it as ‘channel_wise_abs_max’, the accuracy performs better.
generate_test_model (bool, optional) – If set generate_test_model as True, it saves a fake quantized model, in which the weights are quantized and dequantized. We can use PaddlePaddle to load the fake quantized model and test the accuracy on GPU or CPU.
threshold_rate (float, optional) – This api uses abs_max methd to quantize the weight from float32 to int8/16, and the abs max value is important for quantization diff. When the abs_max value is far away from the center of the numerical distribution, we can set threshold_rate between 1e-6 and 1e-8, so the abs max value will be optimized. Default is 0.0.
-
convert_weight_to_fp16
(
save_model_dir
)
convert_weight_to_fp16¶
-
Convert all presistable vars from fp32 to fp16. Note that, this api only changes the data type of variables in __params__ file, and the __model__ file remains unchanged.
- Parameters
-
save_model_dir (str) – The path to save the fp16 model.
-
quantize_weight_to_int
(
save_model_dir,
save_model_filename=None,
save_params_filename=None,
quantizable_op_type=['conv2d', 'mul'],
weight_bits=8,
weight_quantize_type='channel_wise_abs_max',
generate_test_model=False,
threshold_rate=0.0
)