learning_rate_scheduler

cosine_decay

paddle.fluid.layers.cosine_decay(learning_rate, step_each_epoch, epochs)[source]

Applies cosine decay to the learning rate.

when training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by following cosine decay strategy.

\[decayed\_lr = learning\_rate * 0.5 * (math.cos * (epoch * \frac{math.pi}{epochs} ) + 1)\]
Parameters
  • learning_rate (Variable|float) – The initial learning rate.

  • step_each_epoch (int) – the number of steps in an epoch.

  • epochs (int) – the number of epochs.

Returns

The decayed learning rate.

Return type

Variable

Examples

import paddle.fluid as fluid
base_lr = 0.1
lr = fluid.layers.cosine_decay(
learning_rate = base_lr, step_each_epoch=10000, epochs=120)

exponential_decay

paddle.fluid.layers.exponential_decay(learning_rate, decay_steps, decay_rate, staircase=False)[source]

Applies exponential decay to the learning rate.

When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by ‘decay_rate’ every ‘decay_steps’ steps.

Decayed learning rate calcualtes as follows:

>>> if staircase == True:
>>>     decayed_learning_rate = learning_rate * decay_rate ^ floor(global_step / decay_steps)
>>> else:
>>>     decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
Parameters
  • learning_rate (Variable|float) – The initial learning rate. It should be a Variable or a float

  • decay_steps (int) – The learning rate decay steps. See the decay computation above.

  • decay_rate (float) – The learning rate decay rate. See the decay computation above.

  • staircase (bool) – If True, decay the learning rate at discrete intervals, which means the learning rate will be decayed by decay_rate every decay_steps. If False, learning rate will be decayed continuously and following the formula above. Default: False

Returns

The decayed learning rate. The data type is float32.

Return type

Variable

Examples

import paddle.fluid as fluid
base_lr = 0.1
sgd_optimizer = fluid.optimizer.SGD(
    learning_rate=fluid.layers.exponential_decay(
          learning_rate=base_lr,
          decay_steps=10000,
          decay_rate=0.5,
          staircase=True))

inverse_time_decay

paddle.fluid.layers.inverse_time_decay(learning_rate, decay_steps, decay_rate, staircase=False)[source]

Applies inverse time decay to the initial learning rate.

When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, an inverse decay function will be applied to the initial learning rate.

Decayed learning rate calcualtes as follows:

>>> if staircase == True:
>>>     decayed_learning_rate = learning_rate / (1 + decay_rate * floor(global_step / decay_step))
>>> else:
>>>     decayed_learning_rate = learning_rate / (1 + decay_rate * global_step / decay_step)
Parameters
  • learning_rate (Variable|float) – The initial learning rate. It should be a Variable or a float

  • decay_steps (int) – The learning rate decay steps. See the decay computation above.

  • decay_rate (float) – The learning rate decay rate. See the decay computation above.

  • staircase (bool) – If True, decay the learning rate at discrete intervals, which means the learning rate will be decayed by decay_rate times every decay_steps. If False, learning rate will be decayed continuously and following the formula above. Default: False

Returns

The decayed learning rate. The data type is float32.

Return type

Variable

Examples

import paddle.fluid as fluid
base_lr = 0.1
sgd_optimizer = fluid.optimizer.SGD(
    learning_rate=fluid.layers.inverse_time_decay(
          learning_rate=base_lr,
          decay_steps=10000,
          decay_rate=0.5,
          staircase=True))

linear_lr_warmup

paddle.fluid.layers.linear_lr_warmup(learning_rate, warmup_steps, start_lr, end_lr)[source]

This operator use the linear learning rate warm up strategy to adjust the learning rate preliminarily before the normal learning rate scheduling. For more information, please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks

When global_step < warmup_steps, learning rate is updated as:

linear_step = end_lr - start_lr
lr = start_lr + linear_step * (global_step / warmup_steps)

where start_lr is the initial learning rate, and end_lr is the final learning rate;

When global_step >= warmup_steps, learning rate is updated as:

lr = learning_rate

where lr is the learning_rate after warm-up.

Parameters
  • learning_rate (Variable|float) – Learning_rate after warm-up, it could be 1D-Tensor or single value with the data type of float32.

  • warmup_steps (int) – Steps for warm up.

  • start_lr (float) – Initial learning rate of warm up.

  • end_lr (float) – Final learning rate of warm up.

Returns

Warm-up learning rate with the same data type as learning_rate.

Return type

Variable

Examples:

import paddle.fluid as fluid

boundaries = [100, 200]
lr_steps = [0.1, 0.01, 0.001]
learning_rate = fluid.layers.piecewise_decay(boundaries, lr_steps) #case1, 1D-Tensor
#learning_rate = 0.1  #case2, single-value
warmup_steps = 50
start_lr = 1. / 3.
end_lr = 0.1
decayed_lr = fluid.layers.linear_lr_warmup(learning_rate,
    warmup_steps, start_lr, end_lr)

place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
out, = exe.run(fetch_list=[decayed_lr.name])
print(out)
# case1: [0.33333334]
# case2: [0.33333334]

natural_exp_decay

paddle.fluid.layers.natural_exp_decay(learning_rate, decay_steps, decay_rate, staircase=False)[source]

Applies natural exponential decay to the initial learning rate.

When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by natural exponential power ‘decay_rate’ every ‘decay_steps’ steps.

Decayed learning rate calcualtes as follows:

>>> if not staircase:
>>>     decayed_learning_rate = learning_rate * exp(- decay_rate * (global_step / decay_steps))
>>> else:
>>>     decayed_learning_rate = learning_rate * exp(- decay_rate * floor(global_step / decay_steps))
Parameters
  • learning_rate (Variable|float) – The initial learning rate. It should be a Variable or a float

  • decay_steps (int) – The learning rate decay steps. See the decay computation above.

  • decay_rate (float) – The learning rate decay rate. See the decay computation above.

  • staircase (bool) – If True, decay the learning rate at discrete intervals, which means the learning rate will be decayed by natual exponential power decay_rate every decay_steps. If False, learning rate will be decayed continuously and following the formula above. Default: False

Returns

The decayed learning rate. The data type is float32.

Examples

import paddle.fluid as fluid
base_lr = 0.1
sgd_optimizer = fluid.optimizer.SGD(
    learning_rate=fluid.layers.natural_exp_decay(
          learning_rate=base_lr,
          decay_steps=10000,
          decay_rate=0.5,
          staircase=True))

noam_decay

paddle.fluid.layers.noam_decay(d_model, warmup_steps)[source]

Noam decay method. The numpy implementation of noam decay as follows.

import padde.fluid as fluid
import numpy as np
# set hyper parameters
d_model = 2
current_steps = 20
warmup_steps = 200
# compute
lr_value = np.power(d_model, -0.5) * np.min([
                        np.power(current_steps, -0.5),
                        np.power(warmup_steps, -1.5) * current_steps])

Please reference attention is all you need.

Parameters
  • d_model (Variable) – The dimensionality of input and output of model.

  • warmup_steps (Variable) – A super parameter.

Returns

The decayed learning rate.

Examples

import padde.fluid as fluid
warmup_steps = 100
learning_rate = 0.01
lr = fluid.layers.learning_rate_scheduler.noam_decay(
               1/(warmup_steps *(learning_rate ** 2)),
               warmup_steps)

piecewise_decay

paddle.fluid.layers.piecewise_decay(boundaries, values)[source]

Applies piecewise decay to the initial learning rate.

The algorithm can be described as the code below.

boundaries = [10000, 20000]
values = [1.0, 0.5, 0.1]
if step < 10000:
    learning_rate = 1.0
elif 10000 <= step < 20000:
    learning_rate = 0.5
else:
    learning_rate = 0.1
Parameters
  • boundaries – A list of steps numbers.

  • values – A list of learning rate values that will be picked during different step boundaries.

Returns

The decayed learning rate.

Examples

import paddle.fluid as fluid
boundaries = [10000, 20000]
values = [1.0, 0.5, 0.1]
optimizer = fluid.optimizer.Momentum(
    momentum=0.9,
    learning_rate=fluid.layers.piecewise_decay(boundaries=boundaries, values=values),
    regularization=fluid.regularizer.L2Decay(1e-4))

polynomial_decay

paddle.fluid.layers.polynomial_decay(learning_rate, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False)[source]

Applies polynomial decay to the initial learning rate.

if cycle:
  decay_steps = decay_steps * ceil(global_step / decay_steps)
else:
  global_step = min(global_step, decay_steps)
  decayed_learning_rate = (learning_rate - end_learning_rate) *
       (1 - global_step / decay_steps) ^ power + end_learning_rate
Parameters
  • learning_rate (Variable|float32) – A scalar float32 value or a Variable. This will be the initial learning rate during training.

  • decay_steps (int32) – A Python int32 number.

  • end_learning_rate (float) – A Python float number.

  • power (float) – A Python float number.

  • cycle (bool) – If set true, decay the learning rate every decay_steps.

Returns

The decayed learning rate

Return type

Variable

Examples

import paddle.fluid as fluid
start_lr = 0.01
total_step = 5000
end_lr = 0
lr = fluid.layers.polynomial_decay(
    start_lr, total_step, end_lr, power=1)