OneCycleLR

class paddle.optimizer.lr. OneCycleLR ( max_learning_rate, total_steps, divide_factor=25.0, end_learning_rate=0.0001, phase_pct=0.3, anneal_strategy='cos', three_phase=False, last_epoch=- 1, verbose=False ) [source]

Sets the learning rate according to the one cycle learning rate scheduler. The scheduler adjusts the learning rate from an initial learning rate to the maximum learning rate and then from that maximum learning rate to the minimum learning rate, which is much less than the initial learning rate.

It has been proposed in Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.

Please note that the default behaviour of this scheduler follows the fastai implementation of one cycle, which claims that “unpublished work has shown even better results by using only two phases”. If you want the behaviour of this scheduler to be consistent with the paper, please set three_phase=True .

Also note that you should update learning rate each step.

Parameters
  • max_learning_rate (float) – The maximum learning rate. It is a python float number. Functionally, it defines the initial learning rate by divide_factor .

  • total_steps (int) – Number of total training steps.

  • divide_factor (float, optional) – Initial learning rate will be determined by initial_learning_rate = max_learning_rate / divide_factor. Default: 25.

  • end_learning_rate (float, optional) – The minimum learning rate during training, it should be much less than initial learning rate.

  • phase_pct (float) – The percentage of total steps which used to increasing learning rate. Default: 0.3.

  • anneal_strategy (str, optional) – Strategy of adjusting learning rate.’cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.

  • three_phase (bool, optional) –

    Whether to use three phase.

    If True:

    1. The learning rate will first increase from initial learning rate to maximum learning rate.

    2. Then it will decrease to initial learning rate. Number of step in this phase is the same as the one in first phase.

    3. Finally, it will decrease to minimum learning rate which is much less than initial learning rate.

    If False:

    1. The learning rate will increase to maximum learning rate.

    2. Then it will directly decrease to minimum learning rate.

  • last_epoch (int, optional) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool, optional) – If True, prints a message to stdout for each update. Default: False .

Returns

OneCycleLR instance to schedule learning rate.

Examples

>>> # Example1: train on default dynamic graph mode
>>> import paddle
>>> import numpy as np

>>> # train on default dynamic graph mode
>>> linear = paddle.nn.Linear(10, 10)
>>> scheduler = paddle.optimizer.lr.OneCycleLR(max_learning_rate=1.0, total_steps=100, verbose=True)
>>> sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameters=linear.parameters())
>>> for epoch in range(5):
...     for batch_id in range(20):
...         x = paddle.uniform([10, 10])
...         out = linear(x)
...         loss = paddle.mean(out)
...         loss.backward()
...         sgd.step()
...         sgd.clear_gradients()
...         scheduler.step()        # You should update learning rate each step
>>> # Example2: train on static graph mode
>>> import paddle
>>> import numpy as np
>>> paddle.enable_static()
>>> main_prog = paddle.static.Program()
>>> start_prog = paddle.static.Program()
>>> with paddle.static.program_guard(main_prog, start_prog):
...     x = paddle.static.data(name='x', shape=[None, 4, 5])
...     y = paddle.static.data(name='y', shape=[None, 4, 5])
...     z = paddle.static.nn.fc(x, 100)
...     loss = paddle.mean(z)
...     scheduler = paddle.optimizer.lr.OneCycleLR(max_learning_rate=1.0, total_steps=100, verbose=True)
...     sgd = paddle.optimizer.SGD(learning_rate=scheduler)
...     sgd.minimize(loss)
...
>>> exe = paddle.static.Executor()
>>> exe.run(start_prog)
>>> for epoch in range(5):
...     for batch_id in range(20):
...         out = exe.run(
...             main_prog,
...             feed={
...                 'x': np.random.randn(3, 4, 5).astype('float32'),
...                 'y': np.random.randn(3, 4, 5).astype('float32')
...             },
...             fetch_list=loss.name)
...         scheduler.step()    # You should update learning rate each step
...
get_lr ( )

get_lr

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

set_dict ( state_dict )

set_dict

Loads the schedulers state.

set_state_dict ( state_dict )

set_state_dict

Loads the schedulers state.

state_dict ( )

state_dict

Returns the state of the scheduler as a dict.

It is a subset of self.__dict__ .

state_keys ( )

state_keys

For those subclass who overload LRScheduler (Base Class). Acquiescently, “last_epoch, last_lr” will be saved by self.keys = ['last_epoch', 'last_lr'] .

last_epoch is the current epoch num, and last_lr is the current learning rate.

If you want to change the default behavior, you should have a custom implementation of _state_keys() to redefine self.keys .

step ( epoch=None )

step

step should be called after optimizer.step . It will update the learning rate in optimizer according to current epoch . The new learning rate will take effect on next optimizer.step .

Parameters

epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.

Returns

None

Examples

>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()
>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()