CosineAnnealingWarmRestarts¶
- class paddle.optimizer.lr. CosineAnnealingWarmRestarts ( learning_rate, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False ) [source]
-
Set the learning rate of each parameter group using a cosine annealing schedule, where ηmax is set to the initial lr, Tcur is the number of epochs since the last restart and Ti is the number of epochs between two warm restarts in SGDR:
ηt=ηmin+12(ηmax−ηmin)(1+cos(TcurTiπ))When Tcur=Ti, set ηt=ηmin. When Tcur=0 after restart, set ηt=ηmax.
It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts.
- Parameters
-
learning_rate (float) – Initial learning rate.
T_0 (int) – Number of iterations for the first restart.
T_mult (int, optional) – A factor increases Ti after a restart. Default: 1.
eta_min (float, optional) – Minimum learning rate. Default: 0.
last_epoch (int, optional) – The index of last epoch. Default: -1, means initial learning rate.
verbose (bool, optional) – If
True
, prints a message to stdout for each update. Default:False
.
- Returns
-
CosineAnnealingWarmRestarts
instance to schedule learning rate.
Examples
>>> import paddle >>> import numpy as np >>> # train on default dynamic graph mode >>> linear = paddle.nn.Linear(10, 10) >>> scheduler = paddle.optimizer.lr.CosineAnnealingWarmRestarts(learning_rate=0.5, T_0=1, T_mult=2, verbose=True) >>> adam = paddle.optimizer.Adam(learning_rate=scheduler, parameters=linear.parameters()) >>> for epoch in range(10): ... for batch_id in range(10): ... x = paddle.uniform([10, 10]) ... out = linear(x) ... loss = paddle.mean(out) ... loss.backward() ... adam.step() ... adam.clear_grad() ... scheduler.step(epoch) # You should update learning rate each step
>>> import paddle >>> import numpy as np >>> paddle.enable_static() >>> main_prog = paddle.static.Program() >>> start_prog = paddle.static.Program() >>> with paddle.static.program_guard(main_prog, start_prog): ... x = paddle.static.data(name='x', shape=[None, 4, 5]) ... y = paddle.static.data(name='y', shape=[None, 4, 5]) ... z = paddle.static.nn.fc(x, 100) ... loss = paddle.mean(z) ... scheduler = paddle.optimizer.lr.CosineAnnealingWarmRestarts(learning_rate=0.5, T_0=1, T_mult=2,verbose=True) ... sgd = paddle.optimizer.SGD(learning_rate=scheduler) ... sgd.minimize(loss) >>> exe = paddle.static.Executor() >>> exe.run(start_prog) >>> for epoch in range(10): ... for batch_id in range(10): ... out = exe.run( ... main_prog, ... feed={ ... 'x': np.random.randn(3, 4, 5).astype('float32'), ... 'y': np.random.randn(3, 4, 5).astype('float32') ... }, ... fetch_list=loss.name) ... scheduler.step(epoch) # You should update learning rate each step
-
get_lr
(
)
get_lr¶
-
For those subclass who overload
LRScheduler
(Base Class), User should have a custom implementation ofget_lr()
.Otherwise, an
NotImplementedError
exception will be thrown.
-
set_dict
(
state_dict
)
set_dict¶
-
Loads the schedulers state.
-
set_state_dict
(
state_dict
)
set_state_dict¶
-
Loads the schedulers state.
-
state_dict
(
)
state_dict¶
-
Returns the state of the scheduler as a
dict
.It is a subset of
self.__dict__
.
-
state_keys
(
)
state_keys¶
-
For those subclass who overload
LRScheduler
(Base Class). Acquiescently, “last_epoch, last_lr” will be saved byself.keys = ['last_epoch', 'last_lr']
.last_epoch
is the current epoch num, andlast_lr
is the current learning rate.If you want to change the default behavior, you should have a custom implementation of
_state_keys()
to redefineself.keys
.
-
step
(
epoch=None
)
step¶
-
step should be called after optimizer.step() . It will update the learning rate in optimizer. The new learning rate will take effect on next epoch.
- Parameters
-
epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.
- Returns
-
None
Examples
Please refer to the example of current LRScheduler.