# fluid.dygraph¶

## BackwardStrategy¶

class paddle.fluid.dygraph.BackwardStrategy

BackwardStrategy is a descriptor of a how to run the backward process. Now it has:

1. sort_sum_gradient, which will sum the gradient by the reverse order of trace.

Examples

import numpy as np

x = np.ones([2, 2], np.float32)
with fluid.dygraph.guard():
inputs2 = []
for _ in range(10):
inputs2.append(fluid.dygraph.base.to_variable(x))
ret2 = fluid.layers.sums(inputs2)
loss2 = fluid.layers.reduce_sum(ret2)
backward_strategy = fluid.dygraph.BackwardStrategy()
loss2.backward(backward_strategy)


## BatchNorm¶

class paddle.fluid.dygraph.BatchNorm(name_scope, num_channels, act=None, is_test=False, momentum=0.9, epsilon=1e-05, param_attr=None, bias_attr=None, dtype='float32', data_layout='NCHW', in_place=False, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False, fuse_with_relu=False, use_global_stats=False, trainable_statistics=False)[source]

Batch Normalization Layer

Can be used as a normalizer function for conv2d and fully_connected operations. The required data format for this layer is one of the following:

1. NHWC [batch, in_height, in_width, in_channels]

2. NCHW [batch, in_channels, in_height, in_width]

Refer to Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift for more details.

$$input$$ is the input features over a mini-batch.

$\begin{split}\mu_{\beta} &\gets \frac{1}{m} \sum_{i=1}^{m} x_i \qquad &//\ \ mini-batch\ mean \\ \sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \ \mu_{\beta})^2 \qquad &//\ mini-batch\ variance \\ \hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\ \sigma_{\beta}^{2} + \epsilon}} \qquad &//\ normalize \\ y_i &\gets \gamma \hat{x_i} + \beta \qquad &//\ scale\ and\ shift\end{split}$

When use_global_stats = True, the $$\mu_{\beta}$$ and $$\sigma_{\beta}^{2}$$ are not the statistics of one mini-batch. They are global (or running) statistics. (It usually got from the pre-trained model.) The training and testing (or inference) have the same behavior:

$\begin{split}\hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\ \sigma_{\beta}^{2} + \epsilon}} \\ y_i &\gets \gamma \hat{x_i} + \beta\end{split}$
Parameters
• name_scope (str) – The name of this class.

• act (str|None) – Activation type, linear|relu|prelu|…

• is_test (bool) – A flag indicating whether it is in test phrase or not. Default: False

• momentum (float) – The value used for the moving_mean and moving_var computation. The updated formula is: $$moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)$$ $$moving\_var = moving\_var * momentum + new\_var * (1. - momentum)$$ Default is 0.9.

• epsilon (float) – A value added to the denominator for numerical stability. Default is 1e-5.

• param_attr (ParamAttr|None) – The parameter attribute for Parameter scale of batch_norm. If it is set to None or one attribute of ParamAttr, batch_norm will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

• bias_attr (ParamAttr|None) – The parameter attribute for the bias of batch_norm. If it is set to None or one attribute of ParamAttr, batch_norm will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• data_layout (string) – NCHW|NHWC. Default: NCHW

• in_place (bool) – Make the input and output of batch norm reuse memory. Default: False

• moving_mean_name (string|None) – The name of moving_mean which store the global Mean. Default: None

• moving_variance_name (string, Default None) – The name of the moving_variance which store the global Variance.

• do_model_average_for_mean_and_var (bool, Default False) – Do model average for mean and variance or not.

• fuse_with_relu (bool) – if True, this OP performs relu after batch norm. Default: False

• use_global_stats (bool) – Whether to use global mean and variance. In inference or test mode, set use_global_stats to true or is_test to true, and the behavior is equivalent. In train mode, when setting use_global_stats True, the global mean and variance are also used during train period. Default: False

• trainable_statistics (bool) – Whether to calculate mean and var in eval mode. In eval mode, when setting trainable_statistics True, mean and variance will be calculated by current batch statistics.Default: False

Returns

A tensor variable which is the result after applying batch normalization on the input.

Return type

Variable

Examples

import paddle.fluid as fluid

with fluid.dygraph.guard():
fc = fluid.FC('fc', size=200, param_attr='fc1.w')
hidden1 = fc(x)
batch_norm = fluid.BatchNorm("batch_norm", 10)
hidden2 = batch_norm(hidden1)

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## BilinearTensorProduct¶

class paddle.fluid.dygraph.BilinearTensorProduct(name_scope, size, name=None, act=None, param_attr=None, bias_attr=None)[source]

This layer performs bilinear tensor product on two inputs. For example:

$out_{i} = x * W_{i} * {y^\mathrm{T}}, i=0,1,...,size-1$
In this formula:
• $$x$$: the first input contains M elements, shape is [batch_size, M].

• $$y$$: the second input contains N elements, shape is [batch_size, N].

• $$W_{i}$$: the i-th learned weight, shape is [M, N]

• $$out_{i}$$: the i-th element of out, shape is [batch_size, size].

• $$y^\mathrm{T}$$: the transpose of $$y_{2}$$.

Parameters
• name_scope (str) – The name of this class.

• size (int) – The dimension of this layer.

• act (str) – Activation to be applied to the output of this layer. Default: None.

• name (str) – The name of this layer. Default: None.

• param_attr (ParamAttr) – The parameter attribute for the learnable w. parameters/weights of this layer. Default: None.

• bias_attr (ParamAttr) – The parameter attribute for the bias of this layer. If it is set to False, no bias will be added to the output units. If it is set to None, the bias is initialized zero. Default: None.

Returns

A 2-D Tensor of shape [batch_size, size].

Return type

Variable

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
layer1 = numpy.random.random((5, 5)).astype('float32')
layer2 = numpy.random.random((5, 4)).astype('float32')
bilinearTensorProduct = fluid.dygraph.nn.BilinearTensorProduct(
'BilinearTensorProduct', size=1000)
ret = bilinearTensorProduct(fluid.dygraph.base.to_variable(layer1),
fluid.dygraph.base.to_variable(layer2))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## Conv2D¶

class paddle.fluid.dygraph.Conv2D(name_scope, num_filters, filter_size, stride=1, padding=0, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, dtype='float32')[source]

The convolution2D layer calculates the output based on the input, filter and strides, paddings, dilations, groups parameters. Input and Output are in NCHW format, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature. Filter is in MCHW format, where M is the number of output image channels, C is the number of input image channels, H is the height of the filter, and W is the width of the filter. If the groups is greater than 1, C will equal the number of input image channels divided by the groups. Please refer to UFLDL’s convolution <http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/> for more detials. If bias attribution and activation type are provided, bias is added to the output of the convolution, and the corresponding activation function is applied to the final result.

For each input $$X$$, the equation is:

$Out = \sigma (W \ast X + b)$

Where:

• $$X$$: Input value, a tensor with NCHW format.

• $$W$$: Filter value, a tensor with MCHW format.

• $$\ast$$: Convolution operation.

• $$b$$: Bias value, a 2-D tensor with shape [M, 1].

• $$\sigma$$: Activation function.

• $$Out$$: Output value, the shape of $$Out$$ and $$X$$ may be different.

Example

• Input:

Input shape: $$(N, C_{in}, H_{in}, W_{in})$$

Filter shape: $$(C_{out}, C_{in}, H_f, W_f)$$

• Output:

Output shape: $$(N, C_{out}, H_{out}, W_{out})$$

Where

$\begin{split}H_{out}&= \frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (H_f - 1) + 1))}{strides[0]} + 1 \\ W_{out}&= \frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (W_f - 1) + 1))}{strides[1]} + 1\end{split}$
Parameters
• name_scope (str) – The name for this class.

• num_filters (int) – The number of filter. It is as same as the output image channel.

• filter_size (int|tuple|None) – The filter size. If filter_size is a tuple, it must contain two integers, (filter_size_H, filter_size_W). Otherwise, the filter will be a square.

• stride (int|tuple) – The stride size. If stride is a tuple, it must contain two integers, (stride_H, stride_W). Otherwise, the stride_H = stride_W = stride. Default: stride = 1.

• dilation (int|tuple) – The dilation size. If dilation is a tuple, it must contain two integers, (dilation_H, dilation_W). Otherwise, the dilation_H = dilation_W = dilation. Default: dilation = 1.

• groups (int) – The groups number of the Conv2d Layer. According to grouped convolution in Alex Krizhevsky’s Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. Default: groups=1.

• param_attr (ParamAttr|None) – The parameter attribute for learnable parameters/weights of conv2d. If it is set to None or one attribute of ParamAttr, conv2d will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with $$Normal(0.0, std)$$, and the $$std$$ is $$(\frac{2.0 }{filter\_elem\_num})^{0.5}$$. Default: None.

• bias_attr (ParamAttr|bool|None) – The parameter attribute for the bias of conv2d. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, conv2d will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• use_cudnn (bool) – Use cudnn kernel or not, it is valid only when the cudnn library is installed. Default: True

• act (str) – Activation type, if it is set to None, activation is not appended. Default: None

Raises

ValueError – If the shapes of input, filter_size, stride, padding and groups mismatch.

Examples

from paddle.fluid.dygraph.base import to_variable
import numpy as np

data = np.random.uniform( -1, 1, [10, 3, 32, 32] ).astype('float32')
with fluid.dygraph.guard():
conv2d = Conv2D( "conv2d", 2, 3)
data = to_variable( data )
conv = conv2d( data )

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## Conv2DTranspose¶

class paddle.fluid.dygraph.Conv2DTranspose(name_scope, num_filters, output_size=None, filter_size=None, padding=0, stride=1, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None)[source]

Convlution2D transpose layer

The convolution2D transpose layer calculates the output based on the input, filter, and dilations, strides, paddings. Input(Input) and output(Output) are in NCHW format. Where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature. Parameters(dilations, strides, paddings) are two elements. These two elements represent height and width, respectively. The details of convolution transpose layer, please refer to the following explanation and references therein. If bias attribution and activation type are provided, bias is added to the output of the convolution, and the corresponding activation function is applied to the final result.

For each input $$X$$, the equation is:

$Out = \sigma (W \ast X + b)$

Where:

• $$X$$: Input value, a tensor with NCHW format.

• $$W$$: Filter value, a tensor with MCHW format.

• $$\ast$$: Convolution operation.

• $$b$$: Bias value, a 2-D tensor with shape [M, 1].

• $$\sigma$$: Activation function.

• $$Out$$: Output value, the shape of $$Out$$ and $$X$$ may be different.

Example

• Input:

Input shape: $$(N, C_{in}, H_{in}, W_{in})$$

Filter shape: $$(C_{in}, C_{out}, H_f, W_f)$$

• Output:

Output shape: $$(N, C_{out}, H_{out}, W_{out})$$

Where

$\begin{split}H^\prime_{out} &= (H_{in} - 1) * strides[0] - 2 * paddings[0] + dilations[0] * (H_f - 1) + 1 \\ W^\prime_{out} &= (W_{in} - 1) * strides[1] - 2 * paddings[1] + dilations[1] * (W_f - 1) + 1 \\ H_{out} &\in [ H^\prime_{out}, H^\prime_{out} + strides[0] ) \\ W_{out} &\in [ W^\prime_{out}, W^\prime_{out} + strides[1] )\end{split}$
Parameters
• name_scope (str) – The name of this class.

• num_filters (int) – The number of the filter. It is as same as the output image channel.

• output_size (int|tuple|None) – The output image size. If output size is a tuple, it must contain two integers, (image_H, image_W). None if use filter_size, padding, and stride to calculate output_size. if output_size and filter_size are specified at the same time, They should follow the formula above. Default: None.

• filter_size (int|tuple|None) – The filter size. If filter_size is a tuple, it must contain two integers, (filter_size_H, filter_size_W). Otherwise, the filter will be a square. None if use output size to calculate filter_size. Default: None.

• stride (int|tuple) – The stride size. If stride is a tuple, it must contain two integers, (stride_H, stride_W). Otherwise, the stride_H = stride_W = stride. Default: stride = 1.

• dilation (int|tuple) – The dilation size. If dilation is a tuple, it must contain two integers, (dilation_H, dilation_W). Otherwise, the dilation_H = dilation_W = dilation. Default: dilation = 1.

• groups (int) – The groups number of the Conv2d transpose layer. Inspired by grouped convolution in Alex Krizhevsky’s Deep CNN paper, in which when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. Default: groups = 1.

• param_attr (ParamAttr|None) – The parameter attribute for learnable parameters/weights of conv2d_transpose. If it is set to None or one attribute of ParamAttr, conv2d_transpose will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

• bias_attr (ParamAttr|bool|None) – The parameter attribute for the bias of conv2d_transpose. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, conv2d_transpose will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• use_cudnn (bool) – Use cudnn kernel or not, it is valid only when the cudnn library is installed. Default: True.

• act (str) – Activation type, if it is set to None, activation is not appended. Default: None.

Returns

The tensor variable storing the convolution transpose result.

Return type

Variable

Raises

ValueError – If the shapes of input, filter_size, stride, padding and groups mismatch.

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
data = numpy.random.random((3, 32, 32)).astype('float32')
conv2DTranspose = fluid.dygraph.nn.Conv2DTranspose(
'Conv2DTranspose', num_filters=2, filter_size=3)
ret = conv2DTranspose(fluid.dygraph.base.to_variable(data))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## Conv3D¶

class paddle.fluid.dygraph.Conv3D(name_scope, num_filters, filter_size, stride=1, padding=0, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None)[source]

Convlution3D Layer

The convolution3D layer calculates the output based on the input, filter and strides, paddings, dilations, groups parameters. Input(Input) and Output(Output) are in NCDHW format. Where N is batch size C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature. Convlution3D is similar with Convlution2D but adds one dimension(depth). If bias attribution and activation type are provided, bias is added to the output of the convolution, and the corresponding activation function is applied to the final result.

For each input $$X$$, the equation is:

$Out = \sigma (W \ast X + b)$

In the above equation:

• $$X$$: Input value, a tensor with NCDHW format.

• $$W$$: Filter value, a tensor with MCDHW format.

• $$\ast$$: Convolution operation.

• $$b$$: Bias value, a 2-D tensor with shape [M, 1].

• $$\sigma$$: Activation function.

• $$Out$$: Output value, the shape of $$Out$$ and $$X$$ may be different.

Example

• Input:

Input shape: $$(N, C_{in}, D_{in}, H_{in}, W_{in})$$

Filter shape: $$(C_{out}, C_{in}, D_f, H_f, W_f)$$

• Output: Output shape: $$(N, C_{out}, D_{out}, H_{out}, W_{out})$$

Where

$\begin{split}D_{out}&= \frac{(D_{in} + 2 * paddings[0] - (dilations[0] * (D_f - 1) + 1))}{strides[0]} + 1 \\ H_{out}&= \frac{(H_{in} + 2 * paddings[1] - (dilations[1] * (H_f - 1) + 1))}{strides[1]} + 1 \\ W_{out}&= \frac{(W_{in} + 2 * paddings[2] - (dilations[2] * (W_f - 1) + 1))}{strides[2]} + 1\end{split}$
Parameters
• name_scope (str) – The name for this class.

• num_filters (int) – The number of filter. It is as same as the output image channel.

• filter_size (int|tuple|None) – The filter size. If filter_size is a tuple, it must contain three integers, (filter_size_D, filter_size_H, filter_size_W). Otherwise, the filter will be a square.

• stride (int|tuple) – The stride size. If stride is a tuple, it must contain three integers, (stride_D, stride_H, stride_W). Otherwise, the stride_D = stride_H = stride_W = stride. Default: stride = 1.

• dilation (int|tuple) – The dilation size. If dilation is a tuple, it must contain three integers, (dilation_D, dilation_H, dilation_W). Otherwise, the dilation_D = dilation_H = dilation_W = dilation. Default: dilation = 1.

• groups (int) – The groups number of the Conv3d Layer. According to grouped convolution in Alex Krizhevsky’s Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. Default: groups=1

• param_attr (ParamAttr|None) – The parameter attribute for learnable parameters/weights of conv3d. If it is set to None or one attribute of ParamAttr, conv3d will create ParamAttr as param_attr. If it is set to None, the parameter is initialized with $$Normal(0.0, std)$$, and the $$std$$ is $$(\frac{2.0 }{filter\_elem\_num})^{0.5}$$. Default: None.

• bias_attr (ParamAttr|bool|None) – The parameter attribute for the bias of conv3d. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, conv3d will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• use_cudnn (bool) – Use cudnn kernel or not, it is valid only when the cudnn library is installed. Default: True

• act (str) – Activation type, if it is set to None, activation is not appended. Default: None.

Returns

The tensor variable storing the convolution and non-linearity activation result.

Return type

Variable

Raises

ValueError – If the shapes of input, filter_size, stride, padding and groups mismatch.

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
data = numpy.random.random((5, 3, 12, 32, 32)).astype('float32')
conv3d = fluid.dygraph.nn.Conv3D(
'Conv3D', num_filters=2, filter_size=3, act="relu")
ret = conv3d(fluid.dygraph.base.to_variable(data))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## Conv3DTranspose¶

class paddle.fluid.dygraph.Conv3DTranspose(name_scope, num_filters, output_size=None, filter_size=None, padding=0, stride=1, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, name=None)[source]

Convlution3D transpose layer

The convolution3D transpose layer calculates the output based on the input, filter, and dilations, strides, paddings. Input(Input) and output(Output) are in NCDHW format. Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature. Parameters(dilations, strides, paddings) are two elements. These two elements represent height and width, respectively. The details of convolution transpose layer, please refer to the following explanation and references therein. If bias attribution and activation type are provided, bias is added to the output of the convolution, and the corresponding activation function is applied to the final result.

For each input $$X$$, the equation is:

$Out = \sigma (W \ast X + b)$

In the above equation:

• $$X$$: Input value, a tensor with NCDHW format.

• $$W$$: Filter value, a tensor with MCDHW format.

• $$\ast$$: Convolution operation.

• $$b$$: Bias value, a 2-D tensor with shape [M, 1].

• $$\sigma$$: Activation function.

• $$Out$$: Output value, the shape of $$Out$$ and $$X$$ may be different.

Example

• Input:

Input shape: $$(N, C_{in}, D_{in}, H_{in}, W_{in})$$

Filter shape: $$(C_{in}, C_{out}, D_f, H_f, W_f)$$

• Output:

Output shape: $$(N, C_{out}, D_{out}, H_{out}, W_{out})$$

Where

$\begin{split}D_{out} &= (D_{in} - 1) * strides[0] - 2 * paddings[0] + dilations[0] * (D_f - 1) + 1 \\ H_{out} &= (H_{in} - 1) * strides[1] - 2 * paddings[1] + dilations[1] * (H_f - 1) + 1 \\ W_{out} &= (W_{in} - 1) * strides[2] - 2 * paddings[2] + dilations[2] * (W_f - 1) + 1\end{split}$
Parameters
• name_scope (str) – The name for this class.

• num_filters (int) – The number of the filter. It is as same as the output image channel.

• output_size (int|tuple|None) – The output image size. If output size is a tuple, it must contain three integers, (image_D, image_H, image_W). This parameter only works when filter_size is None.

• filter_size (int|tuple|None) – The filter size. If filter_size is a tuple, it must contain three integers, (filter_size_D, filter_size_H, filter_size_W). Otherwise, the filter will be a square. None if use output size to calculate filter_size.

• stride (int|tuple) – The stride size. If stride is a tuple, it must contain three integers, (stride_D, stride_H, stride_W). Otherwise, the stride_D = stride_H = stride_W = stride. Default: stride = 1.

• dilation (int|tuple) – The dilation size. If dilation is a tuple, it must contain three integers, (dilation_D, dilation_H, dilation_W). Otherwise, the dilation_D = dilation_H = dilation_W = dilation. Default: dilation = 1.

• groups (int) – The groups number of the Conv3d transpose layer. Inspired by grouped convolution in Alex Krizhevsky’s Deep CNN paper, in which when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. Default: groups=1

• param_attr (ParamAttr|None) – The parameter attribute for learnable parameters/weights of conv3d_transpose. If it is set to None or one attribute of ParamAttr, conv3d_transpose will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

• bias_attr (ParamAttr|bool|None) – The parameter attribute for the bias of conv3d_transpose. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, conv3d_transpose will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• use_cudnn (bool) – Use cudnn kernel or not, it is valid only when the cudnn library is installed. Default: True

• act (str) – Activation type, if it is set to None, activation is not appended. Default: None.

• name (str|None) – A name for this layer(optional). If set None, the layer will be named automatically.

Returns

The tensor variable storing the convolution transpose result.

Return type

Variable

Raises

ValueError – If the shapes of input, filter_size, stride, padding and groups mismatch.

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
data = numpy.random.random((5, 3, 12, 32, 32)).astype('float32')

conv3dTranspose = fluid.dygraph.nn.Conv3DTranspose(
'Conv3DTranspose',
num_filters=12,
filter_size=12,
use_cudnn=False)
ret = conv3dTranspose(fluid.dygraph.base.to_variable(data))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## CosineDecay¶

class paddle.fluid.dygraph.CosineDecay(learning_rate, step_each_epoch, epochs, begin=0, step=1, dtype='float32')[source]

Applies cosine decay to the learning rate.

when training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by following cosine decay strategy.

$decayed\_lr = learning\_rate * 0.5 * (math.cos * (epoch * \frac{math.pi}{epochs} ) + 1)$
Parameters
• learning_rate (Variable|float) – The initial learning rate.

• step_each_epoch (int) – the number of steps in an epoch.

• epochs (int) – the number of epochs.

• begin (int) – The begin step (default is 0).

• step (int) – The step size (default is 1).

• dtype (str) – The dtype used to create learning rate (default is ‘float32’).

Examples

base_lr = 0.1
with fluid.dygraph.guard():
optimizer  = fluid.optimizer.SGD(
learning_rate = fluid.dygraph.CosineDecay(
base_lr, 10000, 120) )


## Embedding¶

class paddle.fluid.dygraph.Embedding(name_scope, size, is_sparse=False, is_distributed=False, padding_idx=None, param_attr=None, dtype='float32')[source]

Embedding Layer

This layer is used to lookup embeddings of IDs, provided by input, in a lookup table. The result of this lookup is the embedding of each ID in the input. All the input variables are passed in as local variables to the LayerHelper constructor

Parameters
• name_scope (str) – The name of this class.

• size (tuple|list) – The shape of the look up table parameter. It should have two elements which indicate the size of the dictionary of embeddings and the size of each embedding vector respectively.

• is_sparse (bool) – The flag indicating whether to use sparse update. Default: False

• is_distributed (bool) – Whether to run lookup table from remote parameter server. Default: False.

• padding_idx (int|long|None) – If None, it makes no effect to lookup. Otherwise the given padding_idx indicates padding the output with zeros whenever lookup encounters it in input. If $$padding_idx < 0$$, the padding_idx to use in lookup is $$size[0] + dim$$. Default: None.

• param_attr (ParamAttr) – Parameters for this layer. Default: None.

• dtype (np.dtype|core.VarDesc.VarType|str) – The type of data : float32, float_16, int etc. Default: ‘float32’.

Returns

The tensor variable storing the embeddings of the supplied inputs.

Return type

Variable

Examples

import paddle.fluid as fluid
import numpy as np

inp_word = np.array([[[1]]]).astype('int64')
dict_size = 20
with fluid.dygraph.guard():
emb = fluid.dygraph.Embedding(
name_scope='embedding',
size=[dict_size, 32],
param_attr='emb.w',
is_sparse=False)
static_rlt3 = emb(base.to_variable(inp_word))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## enabled¶

paddle.fluid.dygraph.enabled()[source]

## ExponentialDecay¶

class paddle.fluid.dygraph.ExponentialDecay(learning_rate, decay_steps, decay_rate, staircase=False, begin=0, step=1, dtype='float32')[source]

Applies exponential decay to the learning rate.

When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, the learning rate will be decayed by ‘decay_rate’ every ‘decay_steps’ steps.

if staircase == True:
decayed_learning_rate = learning_rate * decay_rate ^ floor(global_step / decay_steps)
else:
decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

Parameters
• learning_rate (Variable|float) – The initial learning rate.

• decay_steps (int) – See the decay computation above.

• decay_rate (float) – The decay rate. See the decay computation above.

• staircase (Boolean) – If True, decay the learning rate at discrete intervals. Default: False

• begin (int) – The begin step (default is 0)

• step (int) – The step size (default is 1)

• dtype (str) – The dtype used to create learning rate (default is ‘float32’)

Examples

import paddle.fluid as fluid
base_lr = 0.1
with fluid.dygraph.guard():
sgd_optimizer = fluid.optimizer.SGD(
learning_rate=fluid.dygraph.ExponentialDecay(
learning_rate=base_lr,
decay_steps=10000,
decay_rate=0.5,
staircase=True))


## FC¶

class paddle.fluid.dygraph.FC(name_scope, size, num_flatten_dims=1, param_attr=None, bias_attr=None, act=None, is_test=False, dtype='float32')[source]

Fully Connected Layer

This function creates a fully connected layer in the network. It can take one or multiple tensors as its inputs(input can be a list of Variable, see Args in detail). It creates a variable called weights for each input tensor, which represents a fully connected weight matrix from each input unit to each output unit. The fully connected layer multiplies each input tensor with its corresponding weight to produce an output Tensor with shape [M, size], where M is batch size. If multiple input tensors are given, the results of multiple output tensors with shape [M, size] will be summed up. If bias_attr is not None, a bias variable will be created and added to the output. Finally, if activation is not None, it will be applied to the output as well.

When the input is single tensor:

$Out = Act({XW + b})$

When the input are multiple tensors:

$Out = Act({\sum_{i=0}^{N-1}X_iW_i + b})$

In the above equation:

• $$N$$: Number of the input. N equals to len(input) if input is list of Variable.

• $$X_i$$: The i-th input tensor.

• $$W_i$$: The i-th weights matrix corresponding i-th input tensor.

• $$b$$: The bias parameter created by this layer (if needed).

• $$Act$$: The activation function.

• $$Out$$: The output tensor.

See below for an example.

Given:
data_1.data = [[[0.1, 0.2],
[0.3, 0.4]]]
data_1.shape = (1, 2, 2) # 1 is batch_size

data_2 = [[[0.1, 0.2, 0.3]]]
data_2.shape = (1, 1, 3)

out = fluid.layers.fc(input=[data_1, data_2], size=2)

Then:
out.data = [[0.18669507, 0.1893476]]
out.shape = (1, 2)

Parameters
• name_scope (str) – The name of this class.

• size (int) – The number of output units in this layer.

• num_flatten_dims (int) – The fc layer can accept an input tensor with more than two dimensions. If this happens, the multidimensional tensor will first be flattened into a 2-dimensional matrix. The parameter num_flatten_dims determines how the input tensor is flattened: the first num_flatten_dims (inclusive, index starts from 1) dimensions will be flatten to form the first dimension of the final matrix (height of the matrix), and the rest rank(X) - num_flatten_dims dimensions are flattened to form the second dimension of the final matrix (width of the matrix). For example, suppose X is a 5-dimensional tensor with a shape [2, 3, 4, 5, 6], and num_flatten_dims = 3. Then, the flattened matrix will have a shape [2 x 3 x 4, 5 x 6] = [24, 30]. Default: 1

• param_attr (ParamAttr|list of ParamAttr|None) – The parameter attribute for learnable parameters/weights of this layer.

• bias_attr (ParamAttr|list of ParamAttr, default None) – The parameter attribute for the bias of this layer. If it is set to False, no bias will be added to the output units. If it is set to None, the bias is initialized zero. Default: None.

• act (str|None) – Activation to be applied to the output of this layer.

• is_test (bool) – A flag indicating whether execution is in test phase. Default: False

• dtype (str) – Dtype used for weight

Raises

ValueError – If rank of the input tensor is less than 2.

Examples

from paddle.fluid.dygraph.base import to_variable
import numpy as np

data = np.random.uniform( -1, 1, [30, 10, 32] ).astype('float32')
with fluid.dygraph.guard():
fc = FC( "fc", 64, num_flatten_dims=2)
data = to_variable( data )
conv = fc( data )

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## GroupNorm¶

class paddle.fluid.dygraph.GroupNorm(name_scope, groups, epsilon=1e-05, param_attr=None, bias_attr=None, act=None, data_layout='NCHW')[source]

Group Normalization Layer

Refer to Group Normalization .

Parameters
• name_scope (str) – The name of this class.

• groups (int) – The number of groups that divided from channels.

• epsilon (float) – The small value added to the variance to prevent division by zero. Default: 1e-05.

• param_attr (ParamAttr|None) – The parameter attribute for the learnable scale $$g$$. If it is set to False, no scale will be added to the output units. If it is set to None, the bias is initialized one. Default: None.

• bias_attr (ParamAttr|None) – The parameter attribute for the learnable bias $$b$$. If it is set to False, no bias will be added to the output units. If it is set to None, the bias is initialized zero. Default: None.

• act (str) – Activation to be applied to the output of group normalizaiton.

• data_layout (string|NCHW) – Only NCHW is supported.

Returns

A tensor variable which is the result after applying group normalization on the input.

Return type

Variable

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
x = numpy.random.random((8, 32, 32)).astype('float32')
groupNorm = fluid.dygraph.nn.GroupNorm('GroupNorm', groups=4)
ret = groupNorm(fluid.dygraph.base.to_variable(x))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## GRUUnit¶

class paddle.fluid.dygraph.GRUUnit(name_scope, size, param_attr=None, bias_attr=None, activation='tanh', gate_activation='sigmoid', origin_mode=False, dtype='float32')[source]

GRU unit layer

if origin_mode is True, then the equation of a gru step is from paper Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation <https://arxiv.org/pdf/1406.1078.pdf>

\begin{align}\begin{aligned}u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)\\r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)\\m_t & = actNode(xm_t + W_c dot(r_t, h_{t-1}) + b_m)\\h_t & = dot(u_t, h_{t-1}) + dot((1-u_t), m_t)\end{aligned}\end{align}

if origin_mode is False, then the equation of a gru step is from paper Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

\begin{align}\begin{aligned}u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)\\r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)\\m_t & = actNode(xm_t + W_c dot(r_t, h_{t-1}) + b_m)\\h_t & = dot((1-u_t), h_{t-1}) + dot(u_t, m_t)\end{aligned}\end{align}

The inputs of gru unit includes $$z_t$$, $$h_{t-1}$$. In terms of the equation above, the $$z_t$$ is split into 3 parts - $$xu_t$$, $$xr_t$$ and $$xm_t$$. This means that in order to implement a full GRU unit operator for an input, a fully connected layer has to be applied, such that $$z_t = W_{fc}x_t$$.

The terms $$u_t$$ and $$r_t$$ represent the update and reset gates of the GRU cell. Unlike LSTM, GRU has one lesser gate. However, there is an intermediate candidate hidden output, which is denoted by $$m_t$$. This layer has three outputs $$h_t$$, $$dot(r_t, h_{t-1})$$ and concatenation of $$u_t$$, $$r_t$$ and $$m_t$$.

Parameters
• name_scope (str) – The name of this class.

• size (int) – The input dimension value.

• param_attr (ParamAttr|None) –

The parameter attribute for the learnable hidden-hidden weight matrix. Note:

• The shape of the weight matrix is $$(T \times 3D)$$, where $$D$$ is the hidden size.

• All elements in the weight matrix can be divided into two parts. The first part are weights of the update gate and reset gate with shape $$(D \times 2D)$$, and the second part are weights for candidate hidden state with shape $$(D \times D)$$.

If it is set to None or one attribute of ParamAttr, gru_unit will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

• bias_attr (ParamAttr|bool|None) – The parameter attribute for the bias of GRU.Note that the bias with $$(1 \times 3D)$$ concatenates the bias in the update gate, reset gate and candidate calculations. If it is set to False, no bias will be applied to the update gate, reset gate and candidate calculations. If it is set to None or one attribute of ParamAttr, gru_unit will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• activation (str) – The activation type for cell (actNode). Default: ‘tanh’

• gate_activation (str) – The activation type for gates (actGate). Default: ‘sigmoid’

• dtype (str) – The dtype of the layers. Default: ‘float32’

Returns

The hidden value, reset-hidden value and gate values.

Return type

tuple

Examples

import paddle.fluid as fluid
import numpy

lod = [[2, 4, 3]]
D = 5
T = sum(lod[0])

hidden_input = numpy.random.rand(T, D).astype('float32')
with fluid.dygraph.guard():
x = numpy.random.random((3, 32, 32)).astype('float32')
gru = fluid.dygraph.GRUUnit('gru', size=D * 3)
dy_ret = gru(
base.to_variable(input), base.to_variable(hidden_input))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## guard¶

paddle.fluid.dygraph.guard(place=None)[source]

This context will create a dygraph context for dygraph to run

Parameters

place (fluid.CPUPlace|fluid.CUDAPlace|None) – Place to run

Returns

None

Examples

import numpy as np

with fluid.dygraph.guard():
inp = np.ones([3, 32, 32], dtype='float32')
t = fluid.dygraph.base.to_variable(inp)
fc1 = fluid.FC('fc1', size=4, bias_attr=False, num_flatten_dims=1)
fc2 = fluid.FC('fc2', size=4)
ret = fc1(t)
dy_ret = fc2(ret)


## InverseTimeDecay¶

class paddle.fluid.dygraph.InverseTimeDecay(learning_rate, decay_steps, decay_rate, staircase=False, begin=0, step=1, dtype='float32')[source]

Applies inverse time decay to the initial learning rate.

When training a model, it is often recommended to lower the learning rate as the training progresses. By using this function, an inverse decay function will be applied to the initial learning rate.

>>> if staircase == True:
>>>     decayed_learning_rate = learning_rate / (1 + decay_rate * floor(global_step / decay_step))
>>> else:
>>>     decayed_learning_rate = learning_rate / (1 + decay_rate * global_step / decay_step)

Parameters
• learning_rate (Variable|float) – The initial learning rate.

• decay_steps (int) – See the decay computation above.

• decay_rate (float) – The decay rate. See the decay computation above.

• staircase (Boolean) – If True, decay the learning rate at discrete intervals. Default: False

• begin (int) – The begin step (default is 0)

• step (int) – The step size (default is 1)

• dtype (str) – The dtype used to create learning rate (default is ‘float32’)

Examples

import paddle.fluid as fluid
base_lr = 0.1
with fluid.dygraph.guard():
sgd_optimizer = fluid.optimizer.SGD(
learning_rate=fluid.dygraph.InverseTimeDecay(
learning_rate=base_lr,
decay_steps=10000,
decay_rate=0.5,
staircase=True))


## Layer¶

class paddle.fluid.dygraph.Layer(name_scope, dtype=VarType.FP32)[source]

Layers composed of operators.

Parameters
• name_scope – prefix name used by the layer to name parameters. If prefix is “my_model/layer_1”, parameter name in MyLayer can be “my_model/layer_1/MyLayer/w_n”, where w is the parameter base name and n is an unique suffix auto-generated.

• dtype – data type for the variables in the layer.

full_name()

Full name for this layers.

Full name is composed by name_scope + “/” + MyLayer.__class__.__name__

Returns full name of this name.

create_parameter(attr, shape, dtype, is_bias=False, default_initializer=None)

Create parameters for this layers.

Args:

attr: [ParamAttr] should be the parameter attribute for this parameter shape: shape of the paramter dtype: data type of this parameter is_bias: if this is a bias parameter default_initializer: set the default initializer for this parameter

Returns created parameter Variable.

create_variable(name=None, persistable=None, dtype=None, type=VarType.LOD_TENSOR)

Create Variable for this layers.

Args:

name: name of the variable persistable: if set this variable persistable dtype: data type of data in the variable type: type of the variable

Returns created Variable.

parameters(include_sublayers=True)

Returns a list of Parameters from current and sub-layers.

Parameters
• include_sublayers – If true, also include the parameters from

• sublayers.

Returns a list of Parameters.

sublayers(include_sublayers=True)

Returns a list of sub layers.

Parameters

include_sublayers – If true, also include the layers from sublayers.

Returns a list of sub layers.

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]
add_sublayer(name, sublayer)

Added sublayer can be access like self.name.

Parameters
• name – name of this sublayer.

• sublayer – an instance of Layer.

Returns

the sublayer passed in.

add_parameter(name, parameter)

Added parameter can be access like self.name.

Parameters
• name – name of this sublayer.

• parameter – an instance of Parameter.

Returns

the parameter passed in.

## LayerNorm¶

class paddle.fluid.dygraph.LayerNorm(name_scope, scale=True, shift=True, begin_norm_axis=1, epsilon=1e-05, param_attr=None, bias_attr=None, act=None)[source]

Assume feature vectors exist on dimensions begin_norm_axis … rank(input) and calculate the moment statistics along these dimensions for each feature vector a with size H, then normalize each feature vector using the corresponding statistics. After that, apply learnable gain and bias on the normalized tensor to scale and shift if scale and shift are set.

Refer to Layer Normalization

The formula is as follows:

\begin{align}\begin{aligned}\mu & = \frac{1}{H}\sum_{i=1}^{H} a_i\\\sigma & = \sqrt{\frac{1}{H}\sum_{i=1}^{H}(a_i - \mu)^2}\\h & = f(\frac{g}{\sigma}(a - \mu) + b)\end{aligned}\end{align}
• $$a$$: the vector representation of the summed inputs to the neurons

in that layer.

• $$H$$: the number of hidden units in a layers

• $$g$$: the trainable scale parameter.

• $$b$$: the trainable bias parameter.

Parameters
• name_scope (str) – The name of this class.

• scale (bool) – Whether to learn the adaptive gain $$g$$ after normalization. Default: True.

• shift (bool) – Whether to learn the adaptive bias $$b$$ after normalization. Default: True.

• begin_norm_axis (int) – The normalization will be performed along dimensions from begin_norm_axis to rank(input). Default: 1.

• epsilon (float) – The small value added to the variance to prevent division by zero. Default: 1e-05.

• param_attr (ParamAttr|None) – The parameter attribute for the learnable gain $$g$$. If scale is False, param_attr is omitted. If scale is True and param_attr is None, a default ParamAttr would be added as scale. The param_attr is initialized as 1 if it is added. Default: None.

• bias_attr (ParamAttr|None) – The parameter attribute for the learnable bias $$b$$. If shift is False, bias_attr is omitted. If shift is True and param_attr is None, a default ParamAttr would be added as bias. The bias_attr is initialized as 0 if it is added. Default: None.

• act (str) – Activation to be applied to the output of layer normalizaiton. Default: None.

Returns

Result after normalization

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
x = numpy.random.random((3, 32, 32)).astype('float32')
layerNorm = fluid.dygraph.nn.LayerNorm(
'LayerNorm', begin_norm_axis=1)
ret = layerNorm(fluid.dygraph.base.to_variable(x))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

paddle.fluid.dygraph.load_persistables(dirname='save_dir')[source]

This function trys to load persistable variables from the folder dirname or the file filename.

Use the dirname to specify the folder where persistable variables were saved. If variables were saved in separate files, set filename None; if all variables were saved in a single file, use filename to specify the file name.

Parameters

dirname (str) – The directory path. default is save_dir

Returns

The parameter-dict resumed from file optimizer dict: The optimizer

Return type

dict

Examples

my_layer = layer(fluid.Layer)
sgd = SGDOptimizer(learning_rate=1e-3)
param_1 = param_dict['PtbModel_0.w_1']


## NaturalExpDecay¶

class paddle.fluid.dygraph.NaturalExpDecay(learning_rate, decay_steps, decay_rate, staircase=False, begin=0, step=1, dtype='float32')[source]

Applies natural exponential decay to the initial learning rate.

if not staircase:
decayed_learning_rate = learning_rate * exp(- decay_rate * (global_step / decay_steps))
else:
decayed_learning_rate = learning_rate * exp(- decay_rate * (global_step / decay_steps))

Parameters
• learning_rate – A scalar float32 value or a Variable. This will be the initial learning rate during training

• decay_steps – A Python int32 number.

• decay_rate – A Python float number.

• staircase – Boolean. If set true, decay the learning rate every decay_steps.

• begin – A Python ‘int32’ number, the begin step (Default is 0)

• step – A Python ‘int32’ number, the step size (Default is 1)

• dtype – A Python ‘str’, the dtype used to create learning rate variable (Default is ‘float32’)

Examples

import paddle.fluid as fluid
base_lr = 0.1
with fluid.dygraph.guard():
sgd_optimizer = fluid.optimizer.SGD(
learning_rate=fluid.dygraph.NaturalExpDecay(
learning_rate=base_lr,
decay_steps=10000,
decay_rate=0.5,
staircase=True))


## NCE¶

class paddle.fluid.dygraph.NCE(name_scope, num_total_classes, sample_weight=None, param_attr=None, bias_attr=None, num_neg_samples=None, sampler='uniform', custom_dist=None, seed=0, is_sparse=False)[source]

Compute and return the noise-contrastive estimation training loss. See Noise-contrastive estimation: A new estimation principle for unnormalized statistical models

By default this operator uses a uniform distribution for sampling.

Parameters
• name_scope (str) – The name of this class.

• num_total_classes (int) – Total number of classes in all samples

• param_attr (ParamAttr|None) – The parameter attribute for learnable parameters/weights of nce. If it is set to None or one attribute of ParamAttr, nce will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

• bias_attr (ParamAttr|bool|None) – The parameter attribute for the bias of nce. If it is set to False, no bias will be added to the output units. If it is set to None or one attribute of ParamAttr, nce will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

• num_neg_samples (int) – The number of negative classes. The default value is 10.

• sampler (str) – The sampler used to sample class from negtive classes. It can be ‘uniform’, ‘log_uniform’ or ‘custom_dist’. default: ‘uniform’.

• custom_dist (float[]|None) – A float[] with size=num_total_classes. It is used when sampler is set to ‘custom_dist’. custom_dist[i] is the probsbility of i-th class to be sampled. Default: None.

• seed (int) – The seed used in sampler. Default: 0.

• is_sparse (bool) – The flag indicating whether to use sparse update, the weight@GRAD and bias@GRAD will be changed to SelectedRows. Default: False.

Returns

The output nce loss.

Return type

Variable

Examples

import numpy as np

window_size = 5
dict_size = 20
label_word = int(window_size // 2) + 1
inp_word = np.array([[[1]], [[2]], [[3]], [[4]], [[5]]]).astype('int64')
nid_freq_arr = np.random.dirichlet(np.ones(20) * 1000).astype('float32')

with fluid.dygraph.guard():
words = []
for i in range(window_size):
words.append(fluid.dygraph.base.to_variable(inp_word[i]))

emb = fluid.Embedding(
'embedding',
size=[dict_size, 32],
param_attr='emb.w',
is_sparse=False)

embs3 = []
for i in range(window_size):
if i == label_word:
continue

emb_rlt = emb(words[i])
embs3.append(emb_rlt)

embs3 = fluid.layers.concat(input=embs3, axis=1)
nce = fluid.NCE('nce',
num_total_classes=dict_size,
num_neg_samples=2,
sampler="custom_dist",
custom_dist=nid_freq_arr.tolist(),
seed=1,
param_attr='nce.w',
bias_attr='nce.b')

nce_loss3 = nce(embs3, words[label_word])

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

paddle.fluid.dygraph.no_grad(func)

## NoamDecay¶

class paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32')[source]

Noam decay method. The numpy implementation of noam decay as follows.

import numpy as np
# set hyper parameters
d_model = 2
current_steps = 20
warmup_steps = 200
# compute
lr_value = np.power(d_model, -0.5) * np.min([
np.power(current_steps, -0.5),
np.power(warmup_steps, -1.5) * current_steps])


Please reference attention is all you need.

Parameters
• d_model (Variable) – The dimensionality of input and output of model.

• warmup_steps (Variable) – A super parameter.

• begin (int) – The begin step (default is 0)

• step (int) – The step size (default is 1)

• dtype (str) – The dtype used to create learning rate (default is ‘float32’)

Examples

import paddle.fluid as fluid
warmup_steps = 100
learning_rate = 0.01
with fluid.dygraph.guard():
optimizer  = fluid.optimizer.SGD(
learning_rate = fluid.dygraph.NoamDecay(
1/(warmup_steps *(learning_rate ** 2)),
warmup_steps) )


## PiecewiseDecay¶

class paddle.fluid.dygraph.PiecewiseDecay(boundaries, values, begin, step=1, dtype='float32')[source]

piecewise decay scheduler

The algorithm can be described as the code below.

boundaries = [10000, 20000]
values = [1.0, 0.5, 0.1]
if step < 10000:
learning_rate = 1.0
elif 10000 <= step < 20000:
learning_rate = 0.5
else:
learning_rate = 0.1

Parameters
• boundaries – A list of steps numbers.

• values – A list of learning rate values that will be picked during different step boundaries.

• begin – The begin step to initilize the self.step_num

• step – The step_size using when calculate the new step_num (Defalult is 1)

• dtype – The dtype used to create the learning rate variable

Examples

import paddle.fluid as fluid
boundaries = [10000, 20000]
values = [1.0, 0.5, 0.1]
with fluid.dygraph.guard():
optimizer = fluid.optimizer.SGD(
learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0) )


## PolynomialDecay¶

class paddle.fluid.dygraph.PolynomialDecay(learning_rate, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False, begin=0, step=1, dtype='float32')[source]

Applies polynomial decay to the initial learning rate.

if cycle:
decay_steps = decay_steps * ceil(global_step / decay_steps)
else:
global_step = min(global_step, decay_steps)
decayed_learning_rate = (learning_rate - end_learning_rate) *
(1 - global_step / decay_steps) ^ power + end_learning_rate

Parameters
• learning_rate (Variable|float32) – A scalar float32 value or a Variable. This will be the initial learning rate during training.

• decay_steps (int32) – A Python int32 number.

• end_learning_rate (float) – A Python float number.

• power (float) – A Python float number.

• cycle (bool) – If set true, decay the learning rate every decay_steps.

• begin (int) – The begin step (default is 0)

• step (int) – The step size (default is 1)

• dtype (str) – The dtype used to create learning rate (default is ‘float32’)

Examples

import paddle.fluid as fluid
start_lr = 0.01
total_step = 5000
end_lr = 0
with fluid.dygraph.guard():
optimizer  = fluid.optimizer.SGD(
learning_rate = fluid.dygraph.PolynomialDecay(
start_lr, total_step, end_lr, power=1.0) )


## Pool2D¶

class paddle.fluid.dygraph.Pool2D(name_scope, pool_size=-1, pool_type='max', pool_stride=1, pool_padding=0, global_pooling=False, use_cudnn=True, ceil_mode=False, exclusive=True, dtype=VarType.FP32)[source]

The pooling2d operation calculates the output based on the input, pooling_type and ksize, strides, paddings parameters.Input(X) and output(Out) are in NCHW format, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature. Parameters(ksize, strides, paddings) are two elements. These two elements represent height and width, respectively. The input(X) size and output(Out) size may be different.

Parameters
• name_scope (str) – The name of this class.

• pool_size (int|list|tuple) – The pool kernel size. If pool kernel size is a tuple or list, it must contain two integers, (pool_size_Height, pool_size_Width). Otherwise, the pool kernel size will be a square of an int. Default: -1

• pool_type (str) – The pooling type, can be “max” for max-pooling and “avg” for average-pooling. Default: max

• pool_stride (int|list|tuple) – The pool stride size. If pool stride size is a tuple or list, it must contain two integers, (pool_stride_Height, pool_stride_Width). Otherwise, the pool stride size will be a square of an int. Default: 1

• pool_padding (int|list|tuple) – The pool padding size. If pool padding size is a tuple, it must contain two integers, (pool_padding_on_Height, pool_padding_on_Width). Otherwise, the pool padding size will be a square of an int. Default: 0

• global_pooling (bool) – Whether to use the global pooling. If global_pooling = true, kernel size and paddings will be ignored. Default: False

• use_cudnn (bool) – Only used in cudnn kernel, need install cudnn. Default: True

• ceil_mode (bool) – Whether to use the ceil function to calculate output height and width. False is the default. If it is set to False, the floor function will be used. Default: False

• exclusive (bool) – Whether to exclude padding points in average pooling mode. Default: True

Returns

The pooling result.

Return type

Variable

Raises
• ValueError – If ‘pool_type’ is not “max” nor “avg”

• ValueError – If ‘global_pooling’ is False and ‘pool_size’ is -1

• ValueError – If ‘use_cudnn’ is not a bool value.

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
data = numpy.random.random((3, 32, 32)).astype('float32')

pool2d = fluid.dygraph.Pool2D("pool2d",pool_size=2,
pool_type='max',
pool_stride=1,
global_pooling=False)
pool2d_res = pool2d(data)

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## PRelu¶

class paddle.fluid.dygraph.PRelu(name_scope, mode, param_attr=None)[source]

Equation:

$y = \max(0, x) + \alpha * \min(0, x)$
Parameters
• name_scope (str) – The name of this class.

• mode (str) – The mode for weight sharing. It supports all, channel and element. all: all elements share same weight channel:elements in a channel share same weight element:each element has a weight

• param_attr (ParamAttr|None) – The parameter attribute for the learnable weight (alpha).

Returns

The output tensor with the same shape as input.

Return type

Variable

Examples

import paddle.fluid as fluid
import numpy as np

inp_np = np.ones([5, 200, 100, 100]).astype('float32')
with fluid.dygraph.guard():
mode = 'channel'
prelu = fluid.PRelu(
'prelu',
mode=mode,
param_attr=fluid.ParamAttr(initializer=fluid.initializer.Constant(1.0)))
dy_rlt = prelu(fluid.dygraph.base.to_variable(inp_np))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## prepare_context¶

paddle.fluid.dygraph.prepare_context(strategy=None)[source]

## save_persistables¶

paddle.fluid.dygraph.save_persistables(model_dict, dirname='save_dir', optimizers=None)[source]

This function filters out all variables in layer.parameters from the give layer, and optimizer’s learning rate decay and then trys to load these variables from the folder dirname or the file filename.

Use the dirname to specify the folder where persistable variables were saved. If variables were saved in separate files, set filename None; if all variables were saved in a single file, use filename to specify the file name.

Parameters
• model_dict (dict of Parameters) – The parameters will be saved. If it is None, nothing will be deal.

• dirname (str) – The directory path.

• optimizers (fluid.Optimizer|list(fluid.Optimizer)|None) – The optimizers to be saved

Returns

None

Examples

ptb_model = PtbModel(
hidden_size=hidden_size,
vocab_size=vocab_size,
num_layers=num_layers,
num_steps=num_steps,
init_scale=init_scale)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
x_data = np.arange(12).reshape(4, 3).astype('int64')
y_data = np.arange(1, 13).reshape(4, 3).astype('int64')
x_data = x_data.reshape((-1, num_steps, 1))
y_data = y_data.reshape((-1, 1))
init_hidden_data = np.zeros(
(num_layers, batch_size, hidden_size), dtype='float32')
init_cell_data = np.zeros(
(num_layers, batch_size, hidden_size), dtype='float32')
x = to_variable(x_data)
y = to_variable(y_data)
init_hidden = to_variable(init_hidden_data)
init_cell = to_variable(init_cell_data)
dy_loss, last_hidden, last_cell = ptb_model(x, y, init_hidden,
init_cell)
dy_loss.backward()
sgd.minimize(dy_loss)
fluid.dygraph.save_persistables(ptb_model.state_dict(), dirname=param_path, sgd)


## SpectralNorm¶

class paddle.fluid.dygraph.SpectralNorm(name_scope, dim=0, power_iters=1, eps=1e-12, name=None)[source]

Spectral Normalization Layer

This layer calculates the spectral normalization value of weight parameters of fc, conv1d, conv2d, conv3d layers which should be 2-D, 3-D, 4-D, 5-D Parameters. Calculations are showed as follows.

Step 1: Generate vector U in shape of [H], and V in shape of [W]. While H is the dim th dimension of the input weights, and W is the product result of remaining dimensions.

Step 2: power_iters shoule be a positive interger, do following calculations with U and V for power_iters rounds.

\begin{align}\begin{aligned}\mathbf{v} := \frac{\mathbf{W}^{T} \mathbf{u}}{\|\mathbf{W}^{T} \mathbf{u}\|_2}\\\mathbf{u} := \frac{\mathbf{W}^{T} \mathbf{v}}{\|\mathbf{W}^{T} \mathbf{v}\|_2}\end{aligned}\end{align}

Step 3: Calculate $$\sigma(\mathbf{W})$$ and normalize weight values.

\begin{align}\begin{aligned}\sigma(\mathbf{W}) = \mathbf{u}^{T} \mathbf{W} \mathbf{v}\\\mathbf{W} = \frac{\mathbf{W}}{\sigma(\mathbf{W})}\end{aligned}\end{align}

Refer to Spectral Normalization .

Parameters
• name_scope (str) – The name of this class.

• dim (int) – The index of dimension which should be permuted to the first before reshaping Input(Weight) to matrix, it should be set as 0 if Input(Weight) is the weight of fc layer, and should be set as 1 if Input(Weight) is the weight of conv layer. Default: 0.

• power_iters (int) – The number of power iterations to calculate spectral norm. Default: 1.

• eps (float) – The epsilon for numerical stability in calculating norms. Default: 1e-12.

• name (str) – The name of this layer. It is optional.

Returns

A tensor variable of weight parameters after spectral normalization.

Return type

Variable

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
x = numpy.random.random((2, 8, 32, 32)).astype('float32')
spectralNorm = fluid.dygraph.nn.SpectralNorm('SpectralNorm', dim=1, power_iters=2)
ret = spectralNorm(fluid.dygraph.base.to_variable(x))

forward(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]

## start_gperf_profiler¶

paddle.fluid.dygraph.start_gperf_profiler()[source]

## stop_gperf_profiler¶

paddle.fluid.dygraph.stop_gperf_profiler()[source]

## to_variable¶

paddle.fluid.dygraph.to_variable(value, block=None, name=None)[source]

This function will create a variable from ndarray

Parameters
• value (ndarray) – the numpy value need to be convert

• block (fluid.Block|None) – which block this variable will be in

• name (str|None) – Name of Varaible

Returns

The variable created from given numpy

Return type

Variable

Examples

import numpy as np

with fluid.dygraph.guard():
x = np.ones([2, 2], np.float32)
y = fluid.dygraph.to_variable(x)


## Tracer¶

class paddle.fluid.dygraph.Tracer(block)[source]

Python wrapper of dygraph tracer

## TreeConv¶

class paddle.fluid.dygraph.TreeConv(name_scope, output_size, num_filters=1, max_depth=2, act='tanh', param_attr=None, bias_attr=None, name=None)[source]

*Tree-Based Convolution Operator*

Tree-Based Convolution is a kind of convolution based on tree structure. Tree-Based Convolution is a part of Tree-Based Convolution Neural Network(TBCNN), which is used to classify tree structures, such as Abstract Syntax Tree. Tree-Based Convolution proposed a kind of data structure called continuous binary tree, which regards multiway tree as binary tree. The paper of Tree-Based Convolution Operator is here: https://arxiv.org/abs/1409.5718v1

Parameters
• name_scope (str) – The name of this class.

• output_size (int) – output feature width

• num_filters (int) – number of filters, Default: 1.

• max_depth (int) – max depth of filters, Default: 2.

• act (str) – activation function, Default: tanh.

• param_attr (ParamAttr) – the parameter attribute for the filters, Default: None.

• bias_attr (ParamAttr) – the parameter attribute for the bias of this layer, Default: None.

• name (str) – a name of this layer(optional). If set None, the layer will be named automatically, Default: None.

Returns

(Tensor) The feature vector of subtrees. The shape of the output tensor is [max_tree_node_size, output_size, num_filters]. The output tensor could be a new feature vector for next tree convolution layers

Return type

out(Variable)

Examples

import paddle.fluid as fluid
import numpy

with fluid.dygraph.guard():
nodes_vector = numpy.random.random((1, 10, 5)).astype('float32')
edge_set = numpy.random.random((1, 9, 2)).astype('int32')
treeConv = fluid.dygraph.nn.TreeConv(
'TreeConv', output_size=6, num_filters=1, max_depth=2)
ret = treeConv(fluid.dygraph.base.to_variable(nodes_vector), fluid.dygraph.base.to_variable(edge_set))

forward`(self: paddle.fluid.core_avx.Layer, arg0: List[paddle.fluid.core_avx.VarBase]) → List[paddle.fluid.core_avx.VarBase]