Softmax

class paddle.nn. Softmax ( axis=- 1, name=None ) [source]

Softmax Activation.

This operator implements the softmax layer. The calculation process is as follows:

  1. The dimension axis of x will be permuted to the last.

2. Then x will be logically flattened to a 2-D matrix. The matrix’s second dimension(row length) is the same as the dimension axis of x, and the first dimension(column length) is the product of all other dimensions of x. For each row of the matrix, the softmax operator squashes the K-dimensional(K is the width of the matrix, which is also the size of x’s dimension axis) vector of arbitrary real values to a K-dimensional vector of real values in the range [0, 1] that add up to 1.

3. After the softmax operation is completed, the inverse operations of steps 1 and 2 are performed to restore the two-dimensional matrix to the same dimension as the x .

It computes the exponential of the given dimension and the sum of exponential values of all the other dimensions in the K-dimensional vector input. Then the ratio of the exponential of the given dimension and the sum of exponential values of all the other dimensions is the output of the softmax operator.

For each row \(i\) and each column \(j\) in the matrix, we have:

\[Softmax[i, j] = \frac{\exp(x[i, j])}{\sum_j(exp(x[i, j])}\]

Example:

Case 1:
  Input:
    x.shape = [2, 3, 4]
    x.data = [[[2.0, 3.0, 4.0, 5.0],
               [3.0, 4.0, 5.0, 6.0],
               [7.0, 8.0, 8.0, 9.0]],
              [[1.0, 2.0, 3.0, 4.0],
               [5.0, 6.0, 7.0, 8.0],
               [6.0, 7.0, 8.0, 9.0]]]

  Attrs:
    axis = -1

  Output:
    out.shape = [2, 3, 4]
    out.data = [[[0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.07232949, 0.19661193, 0.19661193, 0.53444665]],
                [[0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.0320586 , 0.08714432, 0.23688282, 0.64391426]]]

Case 2:
  Input:
    x.shape = [2, 3, 4]
    x.data = [[[2.0, 3.0, 4.0, 5.0],
               [3.0, 4.0, 5.0, 6.0],
               [7.0, 8.0, 8.0, 9.0]],
              [[1.0, 2.0, 3.0, 4.0],
               [5.0, 6.0, 7.0, 8.0],
               [6.0, 7.0, 8.0, 9.0]]]
  Attrs:
    axis = 1

  Output:
    out.shape = [2, 3, 4]
    out.data = [[[0.00657326, 0.00657326, 0.01714783, 0.01714783],
                 [0.01786798, 0.01786798, 0.04661262, 0.04661262],
                 [0.97555875, 0.97555875, 0.93623955, 0.93623955]],
                [[0.00490169, 0.00490169, 0.00490169, 0.00490169],
                 [0.26762315, 0.26762315, 0.26762315, 0.26762315],
                 [0.72747516, 0.72747516, 0.72747516, 0.72747516]]]
Parameters
  • axis (int, optional) – The axis along which to perform log_softmax calculations. It should be in range [-D, D), where D is the dimensions of x . If axis < 0, it works the same way as \(axis + D\) . Default is -1.

  • name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Shape:
  • input: Tensor with any shape.

  • output: Tensor with the same shape as input.

Examples

>>> import paddle

>>> x = paddle.to_tensor([[[2.0, 3.0, 4.0, 5.0],
...                        [3.0, 4.0, 5.0, 6.0],
...                        [7.0, 8.0, 8.0, 9.0]],
...                       [[1.0, 2.0, 3.0, 4.0],
...                        [5.0, 6.0, 7.0, 8.0],
...                        [6.0, 7.0, 8.0, 9.0]]], dtype='float32')
>>> m = paddle.nn.Softmax()
>>> out = m(x)
>>> print(out)
Tensor(shape=[2, 3, 4], dtype=float32, place=Place(cpu), stop_gradient=True,
[[[0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.07232949, 0.19661194, 0.19661194, 0.53444666]],
 [[0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.03205860, 0.08714432, 0.23688284, 0.64391428]]])
forward ( x )

forward

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

extra_repr ( )

extra_repr

Extra representation of this layer, you can have custom implementation of your own layer.