ReduceOnPlateau

class paddle.optimizer.lr. ReduceOnPlateau ( learning_rate, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, epsilon=1e-08, verbose=False ) [source]

Reduce learning rate when metrics has stopped descending. Models often benefit from reducing the learning rate by 2 to 10 times once model performance has no longer improvement.

The metrics is the one which has been pass into step , it must be 1-D Tensor with shape [1]. When metrics stop descending for a patience number of epochs, the learning rate will be reduced to learning_rate * factor . (Specially, mode can also be set to 'max , in this case, when metrics stop ascending for a patience number of epochs, the learning rate will be reduced.)

In addition, After each reduction, it will wait a cooldown number of epochs before resuming above operation.

Parameters
  • learning_rate (float) – The initial learning rate. It is a python float number.

  • mode (str, optional) – 'min' or 'max' can be selected. Normally, it is 'min' , which means that the learning rate will reduce when loss stops descending. Specially, if it’s set to 'max' , the learning rate will reduce when loss stops ascending. Default: 'min' .

  • factor (float, optional) – The Ratio that the learning rate will be reduced. new_lr = origin_lr * factor . It should be less than 1.0. Default: 0.1.

  • patience (int, optional) – When loss doesn’t improve for this number of epochs, learing rate will be reduced. Default: 10.

  • threshold (float, optional) – threshold and threshold_mode will determine the minimum change of loss . This make tiny changes of loss will be ignored. Default: 1e-4.

  • threshold_mode (str, optional) – 'rel' or 'abs' can be selected. In 'rel' mode, the minimum change of loss is last_loss * threshold , where last_loss is loss in last epoch. In 'abs' mode, the minimum change of loss is threshold . Default: 'rel' .

  • cooldown (int, optional) – The number of epochs to wait before resuming normal operation. Default: 0.

  • min_lr (float, optional) – The lower bound of the learning rate after reduction. Default: 0.

  • epsilon (float, optional) – Minimal decay applied to lr. If the difference between new and old lr is smaller than epsilon, the update is ignored. Default: 1e-8.

  • verbose (bool, optional) – If True, prints a message to stdout for each update. Default: False.

Returns

ReduceOnPlateau instance to schedule learning rate.

Examples

import paddle
import numpy as np

# train on default dynamic graph mode
linear = paddle.nn.Linear(10, 10)
scheduler = paddle.optimizer.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True)
sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameters=linear.parameters())
for epoch in range(20):
    for batch_id in range(5):
        x = paddle.uniform([10, 10])
        out = linear(x)
        loss = paddle.mean(out)
        loss.backward()
        sgd.step()
        sgd.clear_gradients()
        scheduler.step(loss)    # If you update learning rate each step
  # scheduler.step(loss)        # If you update learning rate each epoch

# train on static graph mode
paddle.enable_static()
main_prog = paddle.static.Program()
start_prog = paddle.static.Program()
with paddle.static.program_guard(main_prog, start_prog):
    x = paddle.static.data(name='x', shape=[None, 4, 5])
    y = paddle.static.data(name='y', shape=[None, 4, 5])
    z = paddle.static.nn.fc(x, 100)
    loss = paddle.mean(z)
    scheduler = paddle.optimizer.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True)
    sgd = paddle.optimizer.SGD(learning_rate=scheduler)
    sgd.minimize(loss)

exe = paddle.static.Executor()
exe.run(start_prog)
for epoch in range(20):
    for batch_id in range(5):
        out = exe.run(
            main_prog,
            feed={
                'x': np.random.randn(3, 4, 5).astype('float32'),
                'y': np.random.randn(3, 4, 5).astype('float32')
            },
            fetch_list=loss.name)
        scheduler.step(out[0])    # If you update learning rate each step
  # scheduler.step(out[0])        # If you update learning rate each epoch
state_keys ( )

For those subclass who overload LRScheduler (Base Class). Acquiescently, “last_epoch, last_lr” will be saved by self.keys = ['last_epoch', 'last_lr'] .

last_epoch is the current epoch num, and last_lr is the current learning rate.

If you want to change the default behavior, you should have a custom implementation of _state_keys() to redefine self.keys .

step ( metrics, epoch=None )

step should be called after optimizer.step() . It will update the learning rate in optimizer according to metrics . The new learning rate will take effect on next epoch.

Parameters
  • metrics (Tensor|numpy.ndarray|float) – Which will be monitored to determine whether the learning rate will reduce. If it stop descending for a patience number of epochs, the learning rate will reduce. If it’s ‘Tensor’ or ‘numpy.ndarray’, its shape must be [1].

  • epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.

Returns

None

Examples

Please refer to the example of current LRScheduler.

get_lr ( )

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

set_dict ( state_dict )

Loads the schedulers state.

set_state_dict ( state_dict )

Loads the schedulers state.

state_dict ( )

Returns the state of the scheduler as a dict.

It is a subset of self.__dict__ .