NoamDecay

api_attr

imperative programming (dynamic graph)

class paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32', learning_rate=1.0)[source]
Api_attr

imperative

Applies Noam decay to the initial learning rate.

The algorithm can be described as following.

\[decayed\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(global\_step^{-0.5}, global\_step * warmup\_steps^{-1.5})\]

Please reference attention is all you need

Parameters
  • d$_{model}$ (Variable|int) – The dimensionality of input and output feature vector of model. If type is Variable, it’s a tensor with shape [1] and the data type can be int32 or int64. The type can also be python int.

  • warmup_steps (Variable|int) – The number of warmup steps. A super parameter. If type is Variable, it’s a tensor with shape [1] and the data type can be int32 or int64. The type can also be python int.

  • begin (int, optional) – The begin step. The initial value of global_step described above. The default value is 0.

  • step (int, optional) – The step size used to calculate the new global_step in the description above. The default value is 1.

  • dtype (str, optional) – The data type used to create the learning rate variable. The data type can be set as ‘float32’, ‘float64’. The default value is ‘float32’.

  • learning_rate (Variable|float|int) – The initial learning rate. If the type is Variable, it’s a tensor with shape [1], the data type can be float32 or float64. It also can be set to python int number. Default 1.0

Returns

None.

Examples

import paddle.fluid as fluid
warmup_steps = 100
learning_rate = 0.01
with fluid.dygraph.guard():
    emb = fluid.dygraph.Embedding([10, 10])
    optimizer  = fluid.optimizer.SGD(
        learning_rate = fluid.dygraph.NoamDecay(
               1/(warmup_steps *(learning_rate ** 2)),
               warmup_steps),
        parameter_list = emb.parameters())