QuantWeightPass

class paddle.fluid.contrib.slim.quantization.quantization_pass. QuantWeightPass ( scope, place, bias_correction=False, quant_bits=8, save_int_weight=True ) [source]

quant weights and remove weights input quantize_linear node. for example: weight -> quant -> dequant -> conv2d will be frozen into weight -> dequant -> conv2d, and weight will be scaled offline.

Parameters
  • scope (paddle.Scope) – scope is used to get the weight tensor values.

  • place (paddle.CPUPlace|paddle.CUDAPlace|str) – place is used to restore the weight tensors. If it’s string, It can be cpu, and gpu:x, where x is the index of the GPUs.

  • bias_correction (bool) – whether use bias correction for post-training quantization. https://arxiv.org/abs/1810.05723.

  • quant_bits (int, optional) – quantization bit number for weight. Default is 8.

  • save_int_weight (bool, optional) – Whether the type saving the weight is int. Default is True.

Examples

System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/contrib/slim/quantization/quantization_pass.py:docstring of paddle.fluid.contrib.slim.quantization.quantization_pass.QuantWeightPass, line 20)

Error in “code-block” directive: maximum 1 argument(s) allowed, 22 supplied.

.. code-block:: python
    # The original graph will be rewrite.
    import paddle
    from paddle.fluid.contrib.slim.quantization                 import QuantWeightPass
    from paddle.fluid.contrib.slim.graph import IrGraph
    from paddle.fluid import core

    graph = IrGraph(core.Graph(program.desc), for_test=False)
    place = paddle.CPUPlace()
    scope = paddle.static.global_scope()
    quant_weight_pass = QuantWeightPass(scope, place)
    quant_weight_pass.apply(graph)