linear_lr_warmup

paddle.fluid.layers.linear_lr_warmup(learning_rate, warmup_steps, start_lr, end_lr)[source]

This operator use the linear learning rate warm up strategy to adjust the learning rate preliminarily before the normal learning rate scheduling. For more information, please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks

When global_step < warmup_steps, learning rate is updated as:

linear_step = end_lr - start_lr
lr = start_lr + linear_step * (global_step / warmup_steps)

where start_lr is the initial learning rate, and end_lr is the final learning rate;

When global_step >= warmup_steps, learning rate is updated as:

lr = learning_rate

where lr is the learning_rate after warm-up.

Parameters
  • learning_rate (Variable|float) – Learning_rate after warm-up, it could be 1D-Tensor or single value with the data type of float32.

  • warmup_steps (int) – Steps for warm up.

  • start_lr (float) – Initial learning rate of warm up.

  • end_lr (float) – Final learning rate of warm up.

Returns

Warm-up learning rate with the same data type as learning_rate.

Return type

Variable

Examples:

import paddle.fluid as fluid

boundaries = [100, 200]
lr_steps = [0.1, 0.01, 0.001]
learning_rate = fluid.layers.piecewise_decay(boundaries, lr_steps) #case1, 1D-Tensor
#learning_rate = 0.1  #case2, single-value
warmup_steps = 50
start_lr = 1. / 3.
end_lr = 0.1
decayed_lr = fluid.layers.linear_lr_warmup(learning_rate,
    warmup_steps, start_lr, end_lr)

place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
out, = exe.run(fetch_list=[decayed_lr.name])
print(out)
# case1: [0.33333334]
# case2: [0.33333334]