grad

paddle. grad ( outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, no_grad_vars=None ) [source]

Note

This API is ONLY available in imperative mode.

This API computes the sum of gradients of outputs with respect to each inputs .

Parameters
  • outputs (Tensor|list(Tensor)|tuple(Tensor)) – the output Tensor or Tensor list/tuple of the graph to compute gradients.

  • inputs (Tensor|list(Tensor)|tuple(Tensor)) – the input Tensor or Tensor list/tuple of the graph to compute gradients. The returned values of this API are the gradients of inputs .

  • grad_outputs (Tensor|list(Tensor|None)|tuple(Tensor|None), optional) – initial gradient values of outputs . If grad_outputs is None, the initial gradient values of outputs would be Tensors filled with 1; if grad_outputs is not None, it must have the same length as outputs , and in this case, the initial gradient value of the i-th outputs would be: (1) a Tensor filled with 1 when the i-th element of grad_outputs is None; (2) the i-th element of grad_outputs when the i-th element of grad_outputs is a Tensor. Default None.

  • retain_graph (bool, optional) – whether to retain the forward graph which is used to calculate the gradient. When it is True, the graph would be retained, in which way users can calculate backward twice for the same graph. When it is False, the graph would be freed. Default None, which means it is equal to create_graph .

  • create_graph (bool, optional) – whether to create the gradient graphs of the computing process. When it is True, higher order derivatives are supported to compute; when it is False, the gradient graphs of the computing process would be discarded. Default False.

  • only_inputs (bool, optional) – whether to only compute the gradients of inputs . If it is False, the gradients of all remaining leaf Tensors in the graph would be also computed and accumulated. If it is True, only the gradients of inputs would be computed. Default True. only_inputs=False is under development, and it is not supported yet.

  • allow_unused (bool, optional) – whether to raise error or return None if some Tensors of inputs are unreachable in the graph. If some Tensors of inputs are unreachable in the graph (i.e., their gradients are None), error would be raised if allow_unused=False, or None would be returned as their gradients if allow_unused=True. Default False.

  • no_grad_vars (Tensor|list(Tensor)|tuple(Tensor)|set(Tensor), optional) – the Tensors whose gradients are not needed to compute. Default None.

Returns

a list of Tensors, whose length is the same as the Tensor number inside inputs, and the i-th returned Tensor is the sum of gradients of outputs with respect to the i-th inputs.

Return type

list

Examples

>>> import paddle

>>> def test_dygraph_grad(create_graph):
...     x = paddle.ones(shape=[1], dtype='float32')
...     x.stop_gradient = False
...     y = x * x
...
...     # Since y = x * x, dx = 2 * x
...     dx = paddle.grad(
...             outputs=[y],
...             inputs=[x],
...             create_graph=create_graph,
...             retain_graph=True)[0]
...
...     z = y + dx
...
...     # If create_graph = False, the gradient of dx
...     # would not be backpropagated. Therefore,
...     # z = x * x + dx, and x.gradient() = 2 * x = 2.0
...
...     # If create_graph = True, the gradient of dx
...     # would be backpropagated. Therefore,
...     # z = x * x + dx = x * x + 2 * x, and
...     # x.gradient() = 2 * x + 2 = 4.0
...
...     z.backward()
...     return x.gradient()
...
>>> print(test_dygraph_grad(create_graph=False))
[2.]
>>> print(test_dygraph_grad(create_graph=True))
[4.]
>>> import paddle

>>> def test_dygraph_grad(grad_outputs=None):
...     x = paddle.to_tensor(2.0)
...     x.stop_gradient = False
...
...     y1 = x * x
...     y2 = x * 3
...
...     # If grad_outputs=None, dy1 = [1], dy2 = [1].
...     # If grad_outputs=[g1, g2], then:
...     #    - dy1 = [1] if g1 is None else g1
...     #    - dy2 = [1] if g2 is None else g2
...
...     # Since y1 = x * x, dx = 2 * x * dy1.
...     # Since y2 = x * 3, dx = 3 * dy2.
...     # Therefore, the final result would be:
...     # dx = 2 * x * dy1 + 3 * dy2 = 4 * dy1 + 3 * dy2.
...
...     dx = paddle.grad(
...         outputs=[y1, y2],
...         inputs=[x],
...         grad_outputs=grad_outputs)[0]
...
...     return dx.numpy()
...
>>> grad_value = paddle.to_tensor(4.0)
>>> # dy1 = [1], dy2 = [1]
>>> print(test_dygraph_grad(None))
7.

>>> # dy1 = [1], dy2 = [4]
>>> print(test_dygraph_grad([None, grad_value]))
16.

>>> # dy1 = [4], dy2 = [1]
>>> print(test_dygraph_grad([grad_value, None]))
19.

>>> # dy1 = [3], dy2 = [4]
>>> grad_y1 = paddle.to_tensor(3.0)
>>> print(test_dygraph_grad([grad_y1, grad_value]))
24.