paddle.fluid.layers.detection. sigmoid_focal_loss ( x, label, fg_num, gamma=2.0, alpha=0.25 ) [source]






Sigmoid Focal Loss Operator.

Focal Loss is used to address the foreground-background class imbalance existed on the training phase of many computer vision tasks. This OP computes the sigmoid value for each element in the input tensor x, after which focal loss is measured between the sigmoid value and target label.

The focal loss is given as followed:

\[\begin{split}\\mathop{loss_{i,\\,j}}\\limits_{i\\in\\mathbb{[0,\\,N-1]},\\,j\\in\\mathbb{[0,\\,C-1]}}=\\left\\{ \\begin{array}{rcl} - \\frac{1}{fg\_num} * \\alpha * {(1 - \\sigma(x_{i,\\,j}))}^{\\gamma} * \\log(\\sigma(x_{i,\\,j})) & & {(j +1) = label_{i,\\,0}} \\\\ - \\frac{1}{fg\_num} * (1 - \\alpha) * {\sigma(x_{i,\\,j})}^{ \\gamma} * \\log(1 - \\sigma(x_{i,\\,j})) & & {(j +1)!= label_{i,\\,0}} \\end{array} \\right.\end{split}\]

We know that

\[\begin{split}\\sigma(x_j) = \\frac{1}{1 + \\exp(-x_j)}\end{split}\]
  • x (Variable) – A 2-D tensor with shape \([N, C]\) represents the predicted categories of all samples. \(N\) is the number of all samples responsible for optimization in a mini-batch, for example, samples are anchor boxes for object detection and \(N\) is the total number of positive and negative samples in a mini-batch; Samples are images for image classification and \(N\) is the number of images in a mini-batch. \(C\) is the number of classes (Notice: excluding background). The data type of x is float32 or float64.

  • label (Variable) – A 2-D tensor with shape \([N, 1]\) represents the target labels for classification. \(N\) is the number of all samples responsible for optimization in a mini-batch, each sample has one target category. The values for positive samples are in the range of \([1, C]\), and the values for negative samples are 0. The data type of label is int32.

  • fg_num (Variable) – A 1-D tensor with shape [1] represents the number of positive samples in a mini-batch, which should be obtained before this OP. The data type of fg_num is int32.

  • gamma (int|float) – Hyper-parameter to balance the easy and hard examples. Default value is set to 2.0.

  • alpha (int|float) – Hyper-parameter to balance the positive and negative example. Default value is set to 0.25.


A 2-D tensor with shape \([N, C]\), which is the focal loss of each element in the input tensor x.

Return type

Variable(the data type is float32 or float64)


import numpy as np
import paddle.fluid as fluid

num_classes = 10  # exclude background
image_width = 16
image_height = 16
batch_size = 32
max_iter = 20

def gen_train_data():
    x_data = np.random.uniform(0, 255, (batch_size, 3, image_height,
    label_data = np.random.randint(0, num_classes,
                                   (batch_size, 1)).astype('int32')
    return {"x": x_data, "label": label_data}

def get_focal_loss(pred, label, fg_num, num_classes):
    pred = fluid.layers.reshape(pred, [-1, num_classes])
    label = fluid.layers.reshape(label, [-1, 1])
    label.stop_gradient = True
    loss = fluid.layers.sigmoid_focal_loss(
        pred, label, fg_num, gamma=2.0, alpha=0.25)
    loss = fluid.layers.reduce_sum(loss)
    return loss

def build_model(mode='train'):
    x ="x", shape=[-1, 3, -1, -1], dtype='float64')
    output = fluid.layers.pool2d(input=x, pool_type='avg', global_pooling=True)
    output = fluid.layers.fc(
        # Notice: size is set to be the number of target classes (excluding backgorund)
        # because sigmoid activation will be done in the sigmoid_focal_loss op.
    if mode == 'train':
        label ="label", shape=[-1, 1], dtype='int32')
        # Obtain the fg_num needed by the sigmoid_focal_loss op:
        # 0 in label represents background, >=1 in label represents foreground,
        # find the elements in label which are greater or equal than 1, then
        # computed the numbers of these elements.
        data = fluid.layers.fill_constant(shape=[1], value=1, dtype='int32')
        fg_label = fluid.layers.greater_equal(label, data)
        fg_label = fluid.layers.cast(fg_label, dtype='int32')
        fg_num = fluid.layers.reduce_sum(fg_label)
        fg_num.stop_gradient = True
        avg_loss = get_focal_loss(output, label, fg_num, num_classes)
        return avg_loss
        # During evaluating or testing phase,
        # output of the final fc layer should be connected to a sigmoid layer.
        pred = fluid.layers.sigmoid(output)
        return pred

loss = build_model('train')
moment_optimizer = fluid.optimizer.MomentumOptimizer(
    learning_rate=0.001, momentum=0.9)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
for i in range(max_iter):
    outs =, fetch_list=[])