linear_lr_warmup(learning_rate, warmup_steps, start_lr, end_lr)
This operator use the linear learning rate warm up strategy to adjust the learning rate preliminarily before the normal learning rate scheduling. For more information, please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks
When global_step < warmup_steps, learning rate is updated as:
linear_step = end_lr - start_lr lr = start_lr + linear_step * (global_step / warmup_steps)
where start_lr is the initial learning rate, and end_lr is the final learning rate;
When global_step >= warmup_steps, learning rate is updated as:
lr = learning_rate
where lr is the learning_rate after warm-up.
learning_rate (Variable|float) – Learning_rate after warm-up, it could be 1D-Tensor or single value with the data type of float32.
warmup_steps (int) – Steps for warm up.
start_lr (float) – Initial learning rate of warm up.
end_lr (float) – Final learning rate of warm up.
Warm-up learning rate with the same data type as learning_rate.
- Return type
import paddle.fluid as fluid boundaries = [100, 200] lr_steps = [0.1, 0.01, 0.001] learning_rate = fluid.layers.piecewise_decay(boundaries, lr_steps) #case1, 1D-Tensor #learning_rate = 0.1 #case2, single-value warmup_steps = 50 start_lr = 1. / 3. end_lr = 0.1 decayed_lr = fluid.layers.linear_lr_warmup(learning_rate, warmup_steps, start_lr, end_lr) place = fluid.CPUPlace() exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) out, = exe.run(fetch_list=[decayed_lr.name]) print(out) # case1: [0.33333334] # case2: [0.33333334]