learning_rate_scheduler¶
cosine_decay¶

paddle.fluid.layers.
cosine_decay
(learning_rate, step_each_epoch, epochs)[source] Applies cosine decay to the learning rate.
when training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by following cosine decay strategy.
\[decayed\_lr = learning\_rate * 0.5 * (math.cos * (epoch * \frac{math.pi}{epochs} ) + 1)\] Parameters
learning_rate (Variablefloat) – The initial learning rate.
step_each_epoch (int) – the number of steps in an epoch.
epochs (int) – the number of epochs.
 Returns
The decayed learning rate.
 Return type
Variable
Examples
import paddle.fluid as fluid base_lr = 0.1 lr = fluid.layers.cosine_decay( learning_rate = base_lr, step_each_epoch=10000, epochs=120)
exponential_decay¶

paddle.fluid.layers.
exponential_decay
(learning_rate, decay_steps, decay_rate, staircase=False)[source] Applies exponential decay to the learning rate.
When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by ‘decay_rate’ every ‘decay_steps’ steps.
Decayed learning rate calcualtes as follows:
>>> if staircase == True: >>> decayed_learning_rate = learning_rate * decay_rate ^ floor(global_step / decay_steps) >>> else: >>> decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
 Parameters
learning_rate (Variablefloat) – The initial learning rate. It should be a Variable or a float
decay_steps (int) – The learning rate decay steps. See the decay computation above.
decay_rate (float) – The learning rate decay rate. See the decay computation above.
staircase (bool) – If True, decay the learning rate at discrete intervals, which means the learning rate will be decayed by decay_rate every decay_steps. If False, learning rate will be decayed continuously and following the formula above. Default: False
 Returns
The decayed learning rate. The data type is float32.
 Return type
Variable
Examples
import paddle.fluid as fluid base_lr = 0.1 sgd_optimizer = fluid.optimizer.SGD( learning_rate=fluid.layers.exponential_decay( learning_rate=base_lr, decay_steps=10000, decay_rate=0.5, staircase=True))
inverse_time_decay¶

paddle.fluid.layers.
inverse_time_decay
(learning_rate, decay_steps, decay_rate, staircase=False)[source] Applies inverse time decay to the initial learning rate.
When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, an inverse decay function will be applied to the initial learning rate.
Decayed learning rate calcualtes as follows:
>>> if staircase == True: >>> decayed_learning_rate = learning_rate / (1 + decay_rate * floor(global_step / decay_step)) >>> else: >>> decayed_learning_rate = learning_rate / (1 + decay_rate * global_step / decay_step)
 Parameters
learning_rate (Variablefloat) – The initial learning rate. It should be a Variable or a float
decay_steps (int) – The learning rate decay steps. See the decay computation above.
decay_rate (float) – The learning rate decay rate. See the decay computation above.
staircase (bool) – If True, decay the learning rate at discrete intervals, which means the learning rate will be decayed by decay_rate times every decay_steps. If False, learning rate will be decayed continuously and following the formula above. Default: False
 Returns
The decayed learning rate. The data type is float32.
 Return type
Variable
Examples
import paddle.fluid as fluid base_lr = 0.1 sgd_optimizer = fluid.optimizer.SGD( learning_rate=fluid.layers.inverse_time_decay( learning_rate=base_lr, decay_steps=10000, decay_rate=0.5, staircase=True))
linear_lr_warmup¶

paddle.fluid.layers.
linear_lr_warmup
(learning_rate, warmup_steps, start_lr, end_lr)[source] This operator use the linear learning rate warm up strategy to adjust the learning rate preliminarily before the normal learning rate scheduling. For more information, please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks
When global_step < warmup_steps, learning rate is updated as:
linear_step = end_lr  start_lr lr = start_lr + linear_step * (global_step / warmup_steps)
where start_lr is the initial learning rate, and end_lr is the final learning rate;
When global_step >= warmup_steps, learning rate is updated as:
lr = learning_rate
where lr is the learning_rate after warmup.
 Parameters
learning_rate (Variablefloat) – Learning_rate after warmup, it could be 1DTensor or single value with the data type of float32.
warmup_steps (int) – Steps for warm up.
start_lr (float) – Initial learning rate of warm up.
end_lr (float) – Final learning rate of warm up.
 Returns
Warmup learning rate with the same data type as learning_rate.
 Return type
Variable
Examples:
import paddle.fluid as fluid boundaries = [100, 200] lr_steps = [0.1, 0.01, 0.001] learning_rate = fluid.layers.piecewise_decay(boundaries, lr_steps) #case1, 1DTensor #learning_rate = 0.1 #case2, singlevalue warmup_steps = 50 start_lr = 1. / 3. end_lr = 0.1 decayed_lr = fluid.layers.linear_lr_warmup(learning_rate, warmup_steps, start_lr, end_lr) place = fluid.CPUPlace() exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) out, = exe.run(fetch_list=[decayed_lr.name]) print(out) # case1: [0.33333334] # case2: [0.33333334]
natural_exp_decay¶

paddle.fluid.layers.
natural_exp_decay
(learning_rate, decay_steps, decay_rate, staircase=False)[source] Applies natural exponential decay to the initial learning rate.
When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by natural exponential power ‘decay_rate’ every ‘decay_steps’ steps.
Decayed learning rate calcualtes as follows:
>>> if not staircase: >>> decayed_learning_rate = learning_rate * exp( decay_rate * (global_step / decay_steps)) >>> else: >>> decayed_learning_rate = learning_rate * exp( decay_rate * floor(global_step / decay_steps))
 Parameters
learning_rate (Variablefloat) – The initial learning rate. It should be a Variable or a float
decay_steps (int) – The learning rate decay steps. See the decay computation above.
decay_rate (float) – The learning rate decay rate. See the decay computation above.
staircase (bool) – If True, decay the learning rate at discrete intervals, which means the learning rate will be decayed by natual exponential power decay_rate every decay_steps. If False, learning rate will be decayed continuously and following the formula above. Default: False
 Returns
The decayed learning rate. The data type is float32.
Examples
import paddle.fluid as fluid base_lr = 0.1 sgd_optimizer = fluid.optimizer.SGD( learning_rate=fluid.layers.natural_exp_decay( learning_rate=base_lr, decay_steps=10000, decay_rate=0.5, staircase=True))
noam_decay¶

paddle.fluid.layers.
noam_decay
(d_model, warmup_steps)[source] Noam decay method. The numpy implementation of noam decay as follows.
import padde.fluid as fluid import numpy as np # set hyper parameters d_model = 2 current_steps = 20 warmup_steps = 200 # compute lr_value = np.power(d_model, 0.5) * np.min([ np.power(current_steps, 0.5), np.power(warmup_steps, 1.5) * current_steps])
Please reference attention is all you need.
 Parameters
d_model (Variable) – The dimensionality of input and output of model.
warmup_steps (Variable) – A super parameter.
 Returns
The decayed learning rate.
Examples
import padde.fluid as fluid warmup_steps = 100 learning_rate = 0.01 lr = fluid.layers.learning_rate_scheduler.noam_decay( 1/(warmup_steps *(learning_rate ** 2)), warmup_steps)
piecewise_decay¶

paddle.fluid.layers.
piecewise_decay
(boundaries, values)[source] Applies piecewise decay to the initial learning rate.
The algorithm can be described as the code below.
boundaries = [10000, 20000] values = [1.0, 0.5, 0.1] if step < 10000: learning_rate = 1.0 elif 10000 <= step < 20000: learning_rate = 0.5 else: learning_rate = 0.1
 Parameters
boundaries – A list of steps numbers.
values – A list of learning rate values that will be picked during different step boundaries.
 Returns
The decayed learning rate.
Examples
import paddle.fluid as fluid boundaries = [10000, 20000] values = [1.0, 0.5, 0.1] optimizer = fluid.optimizer.Momentum( momentum=0.9, learning_rate=fluid.layers.piecewise_decay(boundaries=boundaries, values=values), regularization=fluid.regularizer.L2Decay(1e4))
polynomial_decay¶

paddle.fluid.layers.
polynomial_decay
(learning_rate, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False)[source] Applies polynomial decay to the initial learning rate.
if cycle: decay_steps = decay_steps * ceil(global_step / decay_steps) else: global_step = min(global_step, decay_steps) decayed_learning_rate = (learning_rate  end_learning_rate) * (1  global_step / decay_steps) ^ power + end_learning_rate
 Parameters
learning_rate (Variablefloat32) – A scalar float32 value or a Variable. This will be the initial learning rate during training.
decay_steps (int32) – A Python int32 number.
end_learning_rate (float) – A Python float number.
power (float) – A Python float number.
cycle (bool) – If set true, decay the learning rate every decay_steps.
 Returns
The decayed learning rate
 Return type
Variable
Examples
import paddle.fluid as fluid start_lr = 0.01 total_step = 5000 end_lr = 0 lr = fluid.layers.polynomial_decay( start_lr, total_step, end_lr, power=1)