paddle.static. append_backward ( loss, parameter_list=None, no_grad_set=None, callbacks=None, checkpoints=None, distop_context=None ) [source]

Static Graph

This function appends backward part to main_program.

A complete neural network training is made up of forward and backward propagation. However, when we configure a network, we only need to specify its forward part. This function uses the chain rule to automatically generate the backward part according to the forward part.

In most cases, users do not need to invoke this function manually. It will be automatically invoked by the optimizer’s minimize function.

  • loss (Tensor) – The loss Tensor of the network.

  • parameter_list (list[Tensor|str]|tuple[Tensor|str], optional) – List/Tuple of Parameters or Parameter.names that need to be updated by optimizers. If it is None, all parameters will be updated. Default: None.

  • no_grad_set (set[Tensor|str], optional) – Set of Tensors or Tensor.names in the Block 0 whose gradients should be ignored. All Tensors with stop_gradient=True from all blocks will be automatically added into this set. If this parameter is not None, the Tensors or Tensor.names in this set will be added to the default set. Default: None.

  • callbacks (list[callable object]|tuple[callable object], optional) – List/Tuple of callback functions. The callbacks are used for doing some custom jobs during backward part building. All callable objects in it will be invoked once each time a new gradient operator is added into the program. The callable object must have two input parameters: block and context . The block is the Block which the new gradient operator will be added to. The context is a map, whose keys are gradient Tensor names and values are corresponding original Tensor . In addition to this, the context has another special key-value pair: the key is string __current_op_desc__ and the value is the op_desc of the gradient operator who has just triggered the callable object. Default: None.


Pairs of parameter and its corresponding gradients. The key is the parameter and the value is gradient Tensor.

Return type

list of tuple ( Tensor , Tensor )


AssertionError – If loss is not an instance of Tensor.


>>> import paddle
>>> import paddle.nn.functional as F

>>> paddle.enable_static()

>>> x ='x', shape=[None, 13], dtype='int64')
>>> y ='y', shape=[None, 1], dtype='float32')
>>> x_emb = paddle.static.nn.embedding(x, size=[100, 256])
>>> y_predict = paddle.static.nn.fc(x=x_emb, size=1, activation=None, name='my_fc')
>>> loss = F.square_error_cost(input=y_predict, label=y)
>>> avg_loss = paddle.mean(loss)

>>> # Get all weights in main_program, not include bias.
>>> all_weights = [param for param in paddle.static.default_main_program().block(0).all_parameters() if 'w_' in]
>>> all_weights_name = [ for w in all_weights]

>>> # return all param_grads needed to be updated if parameter_list set default None.
>>> p_g_list1 = paddle.static.append_backward(loss=avg_loss)
>>> # output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD), (my_fc.b_0, my_fc.b_0@GRAD)]

>>> # return the param_grads corresponding to parameter_list that can be list of param (Tensor).
>>> p_g_list2 = paddle.static.append_backward(loss=avg_loss, parameter_list=all_weights)
>>> # output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD)]

>>> # parameter_list can be list of (str).
>>> p_g_list3 = paddle.static.append_backward(loss=avg_loss, parameter_list=all_weights_name)
>>> # output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD)]

>>> # no_grad_set can be set of Tensors that means grad will be cut off from these Tensors.
>>> p_g_list4 = paddle.static.append_backward(loss=avg_loss, no_grad_set=set([x_emb]))
>>> # output: [(my_fc.w_0, my_fc.w_0@GRAD), (my_fc.b_0, my_fc.b_0@GRAD)]

>>> # no_grad_set can be set of when the Tensor is created inside layers and can't be specified explicitly.
>>> p_g_list5 = paddle.static.append_backward(loss=avg_loss, no_grad_set=set(['my_fc.b_0']))
>>> # output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD)]

>>> # return [] because all param_grads are filtered by no_grad_set.
>>> p_g_list6 = paddle.static.append_backward(loss=avg_loss, parameter_list=all_weights, no_grad_set=set(all_weights))