declarative programming (static graph)

paddle.fluid.backward.append_backward(loss, parameter_list=None, no_grad_set=None, callbacks=None, checkpoints=None)[source]

This function appends backward part to main_program.

A complete neural network training is made up of forward and backward propagation. However, when we configure a network, we only need to specify its forward part. This function uses the chain rule to automatically generate the backward part according to the forward part.

In most cases, users do not need to invoke this function manually. It will be automatically invoked by the optimizer’s minimize function.

  • loss (Variable) – The loss variable of the network.

  • parameter_list (list[Variable|str], optional) – List of Parameters or Parameter.names that need to be updated by optimizers. If it is None, all parameters will be updated. Default: None.

  • no_grad_set (set[Variable|str], optional) – Set of Variables or Variable.names in the Block 0 whose gradients should be ignored. All variables with stop_gradient=True from all blocks will be automatically added into this set. If this parameter is not None, the Variables or Variable.names in this set will be added to the default set. Default: None.

  • callbacks (list[callable object], optional) – List of callback functions. The callbacks are used for doing some custom jobs during backward part building. All callable objects in it will be invoked once each time a new gradient operator is added into the program. The callable object must have two input parameters: ‘block’ and ‘context’. The ‘block’ is the Block which the new gradient operator will be added to. The ‘context’ is a map, whose keys are gradient variable names and values are corresponding original Variable . In addition to this, the ‘context’ has another special key-value pair: the key is string ‘__current_op_desc__’ and the value is the op_desc of the gradient operator who has just triggered the callable object. Default: None.


Pairs of parameter and its corresponding gradients. The key is the parameter and the value is gradient variable.

Return type

list of tuple ( Variable , Variable )


AssertionError – If loss is not an instance of Variable.


import paddle.fluid as fluid

x ='x', shape=[None, 13], dtype='int64')
y ='y', shape=[None, 1], dtype='float32')
x_emb = fluid.embedding(x, size=[100, 256])
y_predict = fluid.layers.fc(input=x_emb, size=1, act=None, name='my_fc')
loss = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(loss)

# Get all weights in main_program, not include bias.
all_weights = [param for param in fluid.default_main_program().block(0).all_parameters() if 'w_' in]
all_weights_name = [ for w in all_weights]

# return all param_grads needed to be updated if parameter_list set default None.
p_g_list1 = fluid.backward.append_backward(loss=avg_loss)
# output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD), (my_fc.b_0, my_fc.b_0@GRAD)]

# return the param_grads corresponding to parameter_list that can be list of param (Variable).
p_g_list2 = fluid.backward.append_backward(loss=avg_loss, parameter_list=all_weights)
# output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD)]

# parameter_list can be list of (str).
p_g_list3 = fluid.backward.append_backward(loss=avg_loss, parameter_list=all_weights_name)
# output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD)]

# no_grad_set can be set of Variables that means grad will be cut off from these Variables.
p_g_list4 = fluid.backward.append_backward(loss=avg_loss, no_grad_set=set([x_emb]))
# output: [(my_fc.w_0, my_fc.w_0@GRAD), (my_fc.b_0, my_fc.b_0@GRAD)]

# no_grad_set can be set of when the Variable is created inside layers and can't be specified explicitly.
p_g_list5 = fluid.backward.append_backward(loss=avg_loss, no_grad_set=set(['my_fc.b_0']))
# output: [(embedding_0.w_0, embedding_0.w_0@GRAD), (my_fc.w_0, my_fc.w_0@GRAD)]

# return [] because all param_grads are filtered by no_grad_set.
p_g_list6 = fluid.backward.append_backward(loss=avg_loss, parameter_list=all_weights, no_grad_set=set(all_weights))