GradScaler¶
- class paddle.amp. GradScaler ( enable=True, init_loss_scaling=32768.0, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True ) [source]
- 
         GradScaler is used for Auto-Mixed-Precision training in dynamic graph mode. It controls the scaling of loss, helps avoiding numerical overflow. The object of this class has nineteen methods scale(), unscale_(), minimize(), step(), update() and get/set api of parameters. scale() is used to multiply the loss by a scale ratio. unscale_() is used to unscale the gradients of parameters, multiplies the gradients of parameters by 1/(scale ratio) minimize() is similar as optimizer.minimize(), performs parameters updating, and it will update the loss_scaling, it equal to step() + update(). step() is similar as optimizer.step(), which performs parameters updating. update is used to update the loss_scaling. Commonly, it is used together with paddle.amp.auto_cast to achieve Auto-Mixed-Precision in dynamic graph mode. - Parameters
- 
           - enable (bool, optional) – Enable loss scaling or not. Default is True. 
- init_loss_scaling (float, optional) – The initial loss scaling factor. Default is 2**15. 
- incr_ratio (float, optional) – The multiplier to use when increasing the loss scaling. Default is 2.0. 
- decr_ratio (float, optional) – The less-than-one-multiplier to use when decreasing the loss scaling. Default is 0.5. 
- incr_every_n_steps (int, optional) – Increases loss scaling every n consecutive steps with finite gradients. Default is 1000. 
- decr_every_n_nan_or_inf (int, optional) – Decreases loss scaling every n accumulated steps with nan or inf gradients. Default is 2. 
- use_dynamic_loss_scaling (bool, optional) – Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamicly. Default is True. 
 
- Returns
- 
           An GradScaler object. 
 Examples import paddle model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) scaler = paddle.amp.GradScaler(init_loss_scaling=1024) data = paddle.rand([10, 3, 32, 32]) with paddle.amp.auto_cast(): conv = model(data) loss = paddle.mean(conv) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward scaler.minimize(optimizer, scaled) # update parameters optimizer.clear_grad() - 
            
           scale
           (
           var
           )
           scale¶
- 
           Multiplies a Tensor by the scale factor and returns scaled outputs. If this instance of GradScaleris not enabled, output are returned unmodified.- Parameters
- 
             var (Tensor) – The tensor to scale. 
- Returns
- 
             The scaled tensor or original tensor. 
 Examples import paddle model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) scaler = paddle.amp.GradScaler(init_loss_scaling=1024) data = paddle.rand([10, 3, 32, 32]) with paddle.amp.auto_cast(): conv = model(data) loss = paddle.mean(conv) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward scaler.minimize(optimizer, scaled) # update parameters optimizer.clear_grad() 
 - 
            
           minimize
           (
           optimizer, 
           *args, 
           **kwargs
           )
           minimize¶
- 
           This function is similar as optimizer.minimize(), which performs parameters updating. If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters. Finally, the loss scaling ratio is updated. - Parameters
- 
             - optimizer (Optimizer) – The optimizer used to update parameters. 
- args – Arguments, which will be forward to optimizer.minimize(). 
- kwargs – Keyword arguments, which will be forward to optimizer.minimize(). 
 
 Examples import paddle model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) scaler = paddle.amp.GradScaler(init_loss_scaling=1024) data = paddle.rand([10, 3, 32, 32]) with paddle.amp.auto_cast(): conv = model(data) loss = paddle.mean(conv) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward scaler.minimize(optimizer, scaled) # update parameters optimizer.clear_grad() 
 - 
            
           step
           (
           optimizer
           )
           step¶
- 
           This function is similar as optimizer.step(), which performs parameters updating. If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters. - Parameters
- 
             optimizer (Optimizer) – The optimizer used to update parameters. 
 Examples # required: gpu import paddle model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) scaler = paddle.amp.GradScaler(init_loss_scaling=1024) data = paddle.rand([10, 3, 32, 32]) with paddle.amp.auto_cast(): conv = model(data) loss = paddle.mean(conv) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward scaler.step(optimizer) # update parameters scaler.update() # update the loss scaling ratio optimizer.clear_grad() 
 - 
            
           update
           (
           )
           update¶
- 
           Updates the loss_scaling. Examples # required: gpu import paddle model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) scaler = paddle.amp.GradScaler(init_loss_scaling=1024) data = paddle.rand([10, 3, 32, 32]) with paddle.amp.auto_cast(): conv = model(data) loss = paddle.mean(conv) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward scaler.step(optimizer) # update parameters scaler.update() # update the loss scaling ratio optimizer.clear_grad() 
 - 
            
           unscale_
           (
           optimizer
           )
           unscale_¶
- 
           Unscale the gradients of parameters, multiplies the gradients of parameters by 1/(loss scaling ratio). If this instance of GradScaleris not enabled, output are returned unmodified.- Parameters
- 
             optimizer (Optimizer) – The optimizer used to update parameters. 
- Returns
- 
             The unscaled parameters or original parameters. 
 Examples # required: gpu import paddle model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) scaler = paddle.amp.GradScaler(init_loss_scaling=1024) data = paddle.rand([10, 3, 32, 32]) with paddle.amp.auto_cast(): conv = model(data) loss = paddle.mean(conv) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward scaler.unscale_(optimizer) # unscale the parameter scaler.step(optimizer) scaler.update() optimizer.clear_grad() 
 - 
            
           is_enable
           (
           )
           is_enable¶
- 
           Enable loss scaling or not. - Returns
- 
             enable loss scaling return True else return False. 
- Return type
- 
             bool 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) enable = scaler.is_enable() print(enable) # True 
 - 
            
           is_use_dynamic_loss_scaling
           (
           )
           is_use_dynamic_loss_scaling¶
- 
           Whether to use dynamic loss scaling. - Returns
- 
             if fixed loss_scaling is used return False, if the loss scaling is updated dynamicly return true. 
- Return type
- 
             bool 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) use_dynamic_loss_scaling = scaler.is_use_dynamic_loss_scaling() print(use_dynamic_loss_scaling) # True 
 - 
            
           get_init_loss_scaling
           (
           )
           get_init_loss_scaling¶
- 
           Return the initial loss scaling factor. - Reurns:
- 
             float: the initial loss scaling factor. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) init_loss_scaling = scaler.get_init_loss_scaling() print(init_loss_scaling) # 1024 
 - 
            
           set_init_loss_scaling
           (
           new_init_loss_scaling
           )
           set_init_loss_scaling¶
- 
           Set the initial loss scaling factor by new_init_loss_scaling. - Parameters
- 
             new_init_loss_scaling (float) – The new_init_loss_scaling used to update initial loss scaling factor. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) print(scaler.get_init_loss_scaling()) # 1024 new_init_loss_scaling = 1000 scaler.set_init_loss_scaling(new_init_loss_scaling) print(scaler.get_init_loss_scaling()) # 1000 
 - 
            
           get_incr_ratio
           (
           )
           get_incr_ratio¶
- 
           Return the multiplier to use when increasing the loss scaling. - Reurns:
- 
             float: the multiplier to use when increasing the loss scaling. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) incr_ratio = scaler.get_incr_ratio() print(incr_ratio) # 2.0 
 - 
            
           set_incr_ratio
           (
           new_incr_ratio
           )
           set_incr_ratio¶
- 
           Set the multiplier to use when increasing the loss scaling by new_incr_ratio, new_incr_ratio should > 1.0. - Parameters
- 
             new_incr_ratio (float) – The new_incr_ratio used to update the multiplier to use when increasing the loss scaling. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) print(scaler.get_incr_ratio()) # 2.0 new_incr_ratio = 3.0 scaler.set_incr_ratio(new_incr_ratio) print(scaler.get_incr_ratio()) # 3.0 
 - 
            
           get_decr_ratio
           (
           )
           get_decr_ratio¶
- 
           Get the less-than-one-multiplier to use when decreasing the loss scaling. - Reurns:
- 
             float: the less-than-one-multiplier to use when decreasing the loss scaling. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) decr_ratio = scaler.get_decr_ratio() print(decr_ratio) # 0.5 
 - 
            
           set_decr_ratio
           (
           new_decr_ratio
           )
           set_decr_ratio¶
- 
           Set the less-than-one-multiplier to use when decreasing the loss scaling by new_incr_ratio, new_decr_ratio should < 1.0. - Parameters
- 
             new_decr_ratio (float) – The new_decr_ratio used to update the less-than-one-multiplier to use when decreasing the loss scaling. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) print(scaler.get_decr_ratio()) # 0.5 new_decr_ratio = 0.1 scaler.set_decr_ratio(new_decr_ratio) print(scaler.get_decr_ratio()) # 0.1 
 - 
            
           get_incr_every_n_steps
           (
           )
           get_incr_every_n_steps¶
- 
           Return the num n, n represent increases loss scaling every n consecutive steps with finite gradients. - Reurns:
- 
             int: the num n, n represent increases loss scaling every n consecutive steps with finite gradients. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) incr_every_n_steps = scaler.get_incr_every_n_steps() print(incr_every_n_steps) # 1000 
 - 
            
           set_incr_every_n_steps
           (
           new_incr_every_n_steps
           )
           set_incr_every_n_steps¶
- 
           Set the num n by new_incr_every_n_steps, n represent increases loss scaling every n consecutive steps with finite gradients. - Parameters
- 
             new_incr_every_n_steps (int) – The new_incr_every_n_steps used to update the num n, n represent increases loss scaling every n consecutive steps with finite gradients. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) print(scaler.get_incr_every_n_steps()) # 1000 new_incr_every_n_steps = 2000 scaler.set_incr_every_n_steps(new_incr_every_n_steps) print(scaler.get_incr_every_n_steps()) # 2000 
 - 
            
           get_decr_every_n_nan_or_inf
           (
           )
           get_decr_every_n_nan_or_inf¶
- 
           Return the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients. - Reurns:
- 
             int: the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) decr_every_n_nan_or_inf = scaler.get_decr_every_n_nan_or_inf() print(decr_every_n_nan_or_inf) # 2 
 - 
            
           set_decr_every_n_nan_or_inf
           (
           new_decr_every_n_nan_or_inf
           )
           set_decr_every_n_nan_or_inf¶
- 
           Set the num n by new_decr_every_n_nan_or_inf, n represent decreases loss scaling every n accumulated steps with nan or inf gradients. - Parameters
- 
             new_decr_every_n_nan_or_inf (int) – The new_decr_every_n_nan_or_inf used to update the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) print(scaler.get_decr_every_n_nan_or_inf()) # 2 new_decr_every_n_nan_or_inf = 3 scaler.set_decr_every_n_nan_or_inf(new_decr_every_n_nan_or_inf) print(scaler.get_decr_every_n_nan_or_inf()) # 3 
 - 
            
           state_dict
           (
           )
           state_dict¶
- 
           Returns the state of the scaler as a dict, If this instance is not enabled, returns an empty dict. - Reurns:
- 
             A dict of scaler includes: scale (tensor): The loss scaling factor. incr_ratio(float): The multiplier to use when increasing the loss scaling. decr_ratio(float): The less-than-one-multiplier to use when decreasing the loss scaling. incr_every_n_steps(int): Increases loss scaling every n consecutive steps with finite gradients. decr_every_n_nan_or_inf(int): Decreases loss scaling every n accumulated steps with nan or inf gradients. incr_count(int): The number of recent consecutive unskipped steps. decr_count(int): The number of recent consecutive skipped steps. use_dynamic_loss_scaling(bool): Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamicly. Default is True. 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) scaler_state = scaler.state_dict() 
 - 
            
           load_state_dict
           (
           state_dict
           )
           load_state_dict¶
- 
           Loads the scaler state. - Parameters
- 
             state_dict (dict) – scaler state. Should be an object returned from a call to GradScaler.state_dict(). 
 Examples # required: gpu,xpu import paddle scaler = paddle.amp.GradScaler(enable=True, init_loss_scaling=1024, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, use_dynamic_loss_scaling=True) scaler_state = scaler.state_dict() scaler.load_state_dict(scaler_state) 
 
