NoamDecay

class paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32')[source]

Applies Noam decay to the initial learning rate.

The algorithm can be described as following.

\[decayed\_learning\_rate = d_{model}^{-0.5} * min(global\_step^{-0.5}, global\_step * warmup\_steps^{-1.5})\]

Please reference attention is all you need

Parameters
  • d$_{model}$ (Variable|int) – The dimensionality of input and output feature vector of model. If type is Variable, it’s a tensor with shape [1] and the data type can be int32 or int64. The type can also be python int.

  • warmup_steps (Variable|int) – The number of warmup steps. A super parameter. If type is Variable, it’s a tensor with shape [1] and the data type can be int32 or int64. The type can also be python int.

  • begin (int, optional) – The begin step. The initial value of global_step described above. The default value is 0.

  • step (int, optional) – The step size used to calculate the new global_step in the description above. The defalult value is 1.

  • dtype (str, optional) – The data type used to create the learning rate variable. The data type can be set as ‘float32’, ‘float64’. The default value is ‘float32’.

Returns

None.

Examples

import paddle.fluid as fluid
warmup_steps = 100
learning_rate = 0.01
with fluid.dygraph.guard():
    optimizer  = fluid.optimizer.SGD(
        learning_rate = fluid.dygraph.NoamDecay(
               1/(warmup_steps *(learning_rate ** 2)),
               warmup_steps) )