MultiheadAttention
- class paddle.compat.nn. MultiheadAttention ( embed_dim: int, num_heads: int, dropout: float = 0.0, bias: bool = True, add_bias_kv: bool = False, add_zero_attn: bool = False, kdim: int | None = None, vdim: int | None = None, batch_first: bool = False, device: PlaceLike | None = None, dtype: DTypeLike | None = None ) [source]
-
Allows the model to jointly attend to information from different representation subspaces.
Multi-Head Attention is defined as:
\[\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1,\dots,\text{head}_h)W^O\]where \(\text{head}_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).
Please refer to Attention Is All You Need for more details.
Note
This layer will use the optimized implementation
paddle.nn.functional.scaled_dot_product_attention()if no need to return the attention weights.- Parameters
-
embed_dim (int) – Total dimension of the model.
num_heads (int) – The number of heads in multi-head attention.
dropout (float, optional) – The dropout probability used on attention weights to drop some attention targets. 0 for no dropout. Default 0.0.
bias (bool, optional) – If specified, adds bias to input / output projection layers. Default: True.
add_bias_kv (bool, optional) – If specified, adds bias to the key and value sequences at axis=0. Default: False.
add_zero_attn (bool, optional) – If specified, adds a new batch of zeros to the key and value sequences at axis=1. Default: False.
kdim (int, optional) – Total number of features for keys. If None, assumed equal to embed_dim. Default: None.
vdim (int, optional) – Total number of features for values. If None, assumed equal to embed_dim. Default: None.
batch_first (bool, optional) – If True, then the input and output tensors are provided as [batch, seq, feature]. Default: False.
device (PlaceLike|None, optional) – The device to initialize parameters on. Default: None.
dtype (DTypeLike|None, optional) – The data type of the parameters. Default: None.
Examples
>>> import paddle >>> from paddle.compat import nn >>> # Example with batch_first=True >>> embed_dim, num_heads = 128, 8 >>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads, batch_first=True) >>> # query: [batch_size, target_seq_len, embed_dim] >>> query = paddle.randn([32, 10, embed_dim]) >>> # key, value: [batch_size, source_seq_len, embed_dim] >>> key = paddle.randn([32, 20, embed_dim]) >>> value = paddle.randn([32, 20, embed_dim]) >>> attn_output, attn_output_weights = multihead_attn(query, key, value) >>> print(attn_output.shape) paddle.Size([32, 10, 128])
-
add_module
(
name: str,
module: paddle.nn.layer.layers.Layer | None
)
None
add_module¶
-
Adds a sub layer instance. Added layer can be accessed by self.name
- Parameters
-
name (str) – name of this sublayer.
layer (Layer) – an instance of Layer.
- Returns
-
None
-
add_parameter
(
name: str,
parameter: Tensor
)
Tensor
add_parameter¶
-
Adds a Parameter instance.
Added parameter can be accessed by self.name
- Parameters
-
name (str) – name of this sublayer.
parameter (Parameter) – an instance of Parameter.
- Returns
-
Parameter, the parameter passed in.
Examples
>>> import paddle >>> paddle.seed(100) >>> class MyLayer(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self._linear = paddle.nn.Linear(1, 1) ... w_tmp = self.create_parameter([1,1]) ... self.add_parameter("w_tmp", w_tmp) ... ... def forward(self, input): ... return self._linear(input) ... >>> mylayer = MyLayer() >>> for name, param in mylayer.named_parameters(): ... print(name, param) w_tmp Parameter containing: Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[-1.01448846]]) _linear.weight Parameter containing: Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[0.18551230]]) _linear.bias Parameter containing: Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=False, [0.])
-
add_sublayer
(
name: str,
sublayer: Layer
)
Layer
add_sublayer¶
-
Adds a sub Layer instance.
Added sublayer can be accessed by self.name
- Parameters
-
name (str) – name of this sublayer.
sublayer (Layer) – an instance of Layer.
- Returns
-
Layer, the sublayer passed in.
Examples
>>> import paddle >>> class MySequential(paddle.nn.Layer): ... def __init__(self, *layers): ... super().__init__() ... if len(layers) > 0 and isinstance(layers[0], tuple): ... for name, layer in layers: ... self.add_sublayer(name, layer) ... else: ... for idx, layer in enumerate(layers): ... self.add_sublayer(str(idx), layer) ... ... def forward(self, input): ... for layer in self._sub_layers.values(): ... input = layer(input) ... return input ... >>> fc1 = paddle.nn.Linear(10, 3) >>> fc2 = paddle.nn.Linear(3, 10, bias_attr=False) >>> model = MySequential(fc1, fc2) >>> for prefix, layer in model.named_sublayers(): ... print(prefix, layer) 0 Linear(in_features=10, out_features=3, dtype=float32) 1 Linear(in_features=3, out_features=10, dtype=float32)
-
apply
(
fn: Callable[[Self], None]
)
Self
apply¶
-
Applies
fnrecursively to every sublayer (as returned by.sublayers()) as well as self. Typical use includes initializing the parameters of a model.- Parameters
-
fn (function) – a function to be applied to each sublayer
- Returns
-
Layer, self
Examples
>>> import paddle >>> import paddle.nn as nn >>> paddle.seed(2023) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> def init_weights(layer): ... if type(layer) == nn.Linear: ... print('before init weight:', layer.weight.numpy()) ... new_weight = paddle.full(shape=layer.weight.shape, dtype=layer.weight.dtype, fill_value=0.9) ... layer.weight.set_value(new_weight) ... print('after init weight:', layer.weight.numpy()) ... >>> net.apply(init_weights) >>> print(net.state_dict()) before init weight: [[ 0.89611185 0.04935038] [-0.5888344 0.99266374]] after init weight: [[0.9 0.9] [0.9 0.9]] before init weight: [[-0.18615901 -0.22924072] [ 1.1517721 0.59859073]] after init weight: [[0.9 0.9] [0.9 0.9]] OrderedDict([('0.weight', Parameter containing: Tensor(shape=[2, 2], dtype=float32, place=Place(cpu), stop_gradient=False, [[0.89999998, 0.89999998], [0.89999998, 0.89999998]])), ('0.bias', Parameter containing: Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=False, [0., 0.])), ('1.weight', Parameter containing: Tensor(shape=[2, 2], dtype=float32, place=Place(cpu), stop_gradient=False, [[0.89999998, 0.89999998], [0.89999998, 0.89999998]])), ('1.bias', Parameter containing: Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=False, [0., 0.]))])
-
astype
(
dtype: DTypeLike | None = None
)
Self
astype¶
-
Casts all parameters and buffers to dtype and then return the Layer.
- Parameters
-
dtype (str|paddle.dtype|numpy.dtype) – target data type of layer. If set str, it can be “bool”, “bfloat16”, “float16”, “float32”, “float64”, “int8”, “int16”, “int32”, “int64”, “uint8”, “complex64”, “complex128”. Default: None
- Returns
-
Layer, self
Examples
>>> import paddle >>> import paddle.nn as nn >>> weight_attr = paddle.ParamAttr(name="weight",initializer=paddle.nn.initializer.Constant(value=1.5)) >>> bias_attr = paddle.ParamAttr(name="bias",initializer=paddle.nn.initializer.Constant(value=2.5)) >>> linear = paddle.nn.Linear(2, 2, weight_attr=weight_attr, bias_attr=bias_attr).to(device="cpu",dtype="float32") >>> print(linear) Linear(in_features=2, out_features=2, dtype=float32) >>> print(linear.parameters()) [Parameter containing: Tensor(shape=[2, 2], dtype=float32, place=Place(cpu), stop_gradient=False, [[1.50000000, 1.50000000], [1.50000000, 1.50000000]]), Parameter containing: Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=False, [2.50000000, 2.50000000])] >>> linear=linear.astype("int8") >>> print(linear) Linear(in_features=2, out_features=2, dtype=paddle.int8) >>> print(linear.parameters()) >>> [Parameter containing: Tensor(shape=[2, 2], dtype=int8, place=Place(cpu), stop_gradient=False, [[1, 1], [1, 1]]), Parameter containing: Tensor(shape=[2], dtype=int8, place=Place(cpu), stop_gradient=False, [2, 2])] >>>
-
bfloat16
(
excluded_layers: Layer | Sequence[Layer] | None = None
)
Self
bfloat16¶
-
Casts all floating point parameters and buffers to
bfloat16data type.Note
nn.BatchNormdoes not supportbfloat16weights, so it would not be converted by default.- Parameters
-
excluded_layers (nn.Layer|list|tuple|None, optional) – Specify the layers that need to be kept original data type. if excluded_layers is None, casts all floating point parameters and buffers except
nn.BatchNorm. Default: None. - Returns
-
self
- Return type
-
Layer
Examples
>>> >>> import paddle >>> class Model(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self.linear = paddle.nn.Linear(1, 1) ... self.dropout = paddle.nn.Dropout(p=0.5) ... ... def forward(self, input): ... out = self.linear(input) ... out = self.dropout(out) ... return out ... >>> model = Model() >>> model.bfloat16() >>> #UserWarning: Paddle compiled by the user does not support bfloat16, so keep original data type. Model( (linear): Linear(in_features=1, out_features=1, dtype=float32) (dropout): Dropout(p=0.5, axis=None, mode=upscale_in_train) )
-
buffers
(
include_sublayers: bool = True
)
list[paddle.Tensor]
buffers¶
-
Returns a list of all buffers from current layer and its sub-layers.
- Parameters
-
include_sublayers (bool, optional) – Whether include the buffers of sublayers. If True, also include the buffers from sublayers. Default: True.
- Returns
-
list of Tensor, a list of buffers.
Examples
>>> import numpy as np >>> import paddle >>> linear = paddle.nn.Linear(10, 3) >>> value = np.array([0]).astype("float32") >>> buffer = paddle.to_tensor(value) >>> linear.register_buffer("buf_name", buffer, persistable=True) >>> print(linear.buffers()) [Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, [0.])]
-
children
(
)
Iterable[Layer]
children¶
-
Returns an iterator over immediate children layers.
- Yields
-
Layer – a child layer
Examples
>>> import paddle >>> linear1 = paddle.nn.Linear(10, 3) >>> linear2 = paddle.nn.Linear(3, 10, bias_attr=False) >>> model = paddle.nn.Sequential(linear1, linear2) >>> layer_list = list(model.children()) >>> print(layer_list) [Linear(in_features=10, out_features=3, dtype=float32), Linear(in_features=3, out_features=10, dtype=float32)]
-
clear_gradients
(
set_to_zero: bool = True
)
None
clear_gradients¶
-
Clear the gradients of all parameters for this layer.
- Parameters
-
set_to_zero (bool, optional) – Whether to set the trainable parameters’ gradients to zero or None. Default is True.
- Returns
-
None
Examples
>>> import paddle >>> import numpy as np >>> value = np.arange(26).reshape(2, 13).astype("float32") >>> a = paddle.to_tensor(value) >>> linear = paddle.nn.Linear(13, 5) >>> adam = paddle.optimizer.Adam(learning_rate=0.01, ... parameters=linear.parameters()) >>> out = linear(a) >>> out.backward() >>> adam.step() >>> linear.clear_gradients()
-
cpu
(
)
Self
cpu¶
-
Move all model parameters and buffers to the CPU.
- Returns
-
self
- Return type
-
Layer
-
create_parameter
(
shape: ShapeLike,
attr: ParamAttrLike | None = None,
dtype: DTypeLike | None = None,
is_bias: bool = False,
default_initializer: Initializer | None = None,
device: PlaceLike | None = None
)
Tensor
create_parameter¶
-
Create parameters for this layer.
- Parameters
-
shape (list) – Shape of the parameter. The data type in the list must be int.
attr (ParamAttr, optional) – Parameter attribute of weight. Please refer to ParamAttr. Default: None.
dtype (str, optional) – Data type of this parameter. If set str, it can be “bool”, “float16”, “float32”, “float64”, “int8”, “int16”, “int32”, “int64”, “uint8” or “uint16”. Default: “float32”.
is_bias (bool, optional) – if this is a bias parameter. Default: False.
default_initializer (Initializer, optional) – the default initializer for this parameter. If set None, default initializer will be set to paddle.nn.initializer.Xavier and paddle.nn.initializer.Constant for non-bias and bias parameter, respectively. Default: None.
device (PlaceLike, optional) – the device place for the parameter. Default: None.
- Returns
-
Tensor, created parameter.
Examples
>>> import paddle >>> paddle.seed(2023) >>> class MyLayer(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self._linear = paddle.nn.Linear(1, 1) ... w_tmp = self.create_parameter([1,1]) ... self.add_parameter("w_tmp", w_tmp) ... ... def forward(self, input): ... return self._linear(input) ... >>> mylayer = MyLayer() >>> for name, param in mylayer.named_parameters(): ... print(name, param) # will print w_tmp,_linear.weight,_linear.bias w_tmp Parameter containing: Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[0.06979191]]) _linear.weight Parameter containing: Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[1.26729357]]) _linear.bias Parameter containing: Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=False, [0.])
-
create_tensor
(
name: str | None = None,
persistable: bool | None = None,
dtype: DTypeLike | None = None
)
Tensor
create_tensor¶
-
Create Tensor for this layer.
- Parameters
-
name (str, optional) – name of the tensor. Please refer to api_guide_Name . Default: None.
persistable (bool, optional) – if set this tensor persistable. Default: False.
dtype (str, optional) – data type of this parameter. If set str, it can be “bool”, “float16”, “float32”, “float64”, “int8”, “int16”, “int32”, “int64”, “uint8” or “uint16”. If set None, it will be “float32”. Default: None.
- Returns
-
Tensor, created Tensor.
Examples
>>> import paddle >>> class MyLinear(paddle.nn.Layer): ... def __init__(self, ... in_features, ... out_features): ... super().__init__() ... self.linear = paddle.nn.Linear(10, 10) ... ... self.back_var = self.create_tensor(name = "linear_tmp_0", dtype=self._dtype) ... ... def forward(self, input): ... out = self.linear(input) ... paddle.assign(out, self.back_var) ... ... return out
-
create_variable
(
name: str | None = None,
persistable: bool | None = None,
dtype: DTypeLike | None = None
)
Tensor
create_variable¶
-
Warning
API “paddle.nn.layer.layers.create_variable” is deprecated since 2.0.0, and will be removed in future versions. Please use “paddle.nn.Layer.create_tensor” instead. Reason: New api in create_tensor, easier to use.
Create Tensor for this layer.
- Parameters
-
name (str, optional) – name of the tensor. Please refer to api_guide_Name . Default: None
persistable (bool, optional) – if set this tensor persistable. Default: False
dtype (str, optional) – data type of this parameter. If set str, it can be “bool”, “float16”, “float32”, “float64”,”int8”, “int16”, “int32”, “int64”, “uint8” or “uint16”. If set None, it will be “float32”. Default: None
- Returns
-
Tensor, created Tensor.
Examples
>>> import paddle >>> class MyLinear(paddle.nn.Layer): ... def __init__(self, ... in_features, ... out_features): ... super().__init__() ... self.linear = paddle.nn.Linear( 10, 10) ... ... self.back_var = self.create_variable(name = "linear_tmp_0", dtype=self._dtype) ... ... def forward(self, input): ... out = self.linear(input) ... paddle.assign( out, self.back_var) ... ... return out
-
cuda
(
device: int | PlaceLike | None = None
)
Self
cuda¶
-
Move all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the layer will live on GPU while being optimized.
- Parameters
-
device (int, optional) – if specified, all parameters will be copied to that device.
- Returns
-
self
- Return type
-
Layer
-
double
(
)
Self
double¶
-
Casts all floating point parameters and buffers to
doubledatatype.- Returns
-
self
- Return type
-
Module
-
eval
(
)
Self
eval¶
-
Sets this Layer and all its sublayers to evaluation mode. This only effects certain modules like Dropout and BatchNorm.
- Returns
-
self
- Return type
-
Layer
Examples
>>> import paddle >>> paddle.seed(100) >>> class MyLayer(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self._linear = paddle.nn.Linear(1, 1) ... self._dropout = paddle.nn.Dropout(p=0.5) ... ... def forward(self, input): ... temp = self._linear(input) ... temp = self._dropout(temp) ... return temp ... >>> x = paddle.randn([10, 1], 'float32') >>> mylayer = MyLayer() >>> mylayer.eval() # set mylayer._dropout to eval mode >>> out = mylayer(x) >>> print(out) Tensor(shape=[10, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[-1.72439659], [ 0.31532824], [ 0.01192369], [-0.36912638], [-1.63426113], [-0.93169814], [ 0.32222399], [-1.61092973], [ 0.77209264], [-0.34038994]])
-
extra_repr
(
)
str
extra_repr¶
-
Extra representation of this layer, you can have custom implementation of your own layer.
-
float
(
excluded_layers: Layer | Sequence[Layer] | None = None
)
Self
float¶
-
Casts all floating point parameters and buffers to
floatdata type.- Parameters
-
excluded_layers (nn.Layer|list|tuple|None, optional) – Specify the layers that need to be kept original data type. if excluded_layers is None, casts all floating point parameters and buffers. Default: None.
- Returns
-
self
- Return type
-
Layer
Examples
>>> import paddle >>> class Model(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self.linear = paddle.nn.Linear(1, 1) ... self.dropout = paddle.nn.Dropout(p=0.5) ... ... def forward(self, input): ... out = self.linear(input) ... out = self.dropout(out) ... return out >>> model = Model() >>> model.float() Model( (linear): Linear(in_features=1, out_features=1, dtype=paddle.float32) (dropout): Dropout(p=0.5, axis=None, mode=upscale_in_train, inplace=False) )
-
float16
(
excluded_layers: Layer | Sequence[Layer] | None = None
)
Self
float16¶
-
Casts all floating point parameters and buffers to
float16data type.Note
nn.BatchNormdoes not supportbfloat16weights, so it would not be converted by default.- Parameters
-
excluded_layers (nn.Layer|list|tuple|None, optional) – Specify the layers that need to be kept original data type. if excluded_layers is None, casts all floating point parameters and buffers except
nn.BatchNorm. Default: None. - Returns
-
self
- Return type
-
Layer
Examples
>>> >>> import paddle >>> class Model(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self.linear = paddle.nn.Linear(1, 1) ... self.dropout = paddle.nn.Dropout(p=0.5) ... ... def forward(self, input): ... out = self.linear(input) ... out = self.dropout(out) ... return out ... >>> model = Model() >>> model.float16() Model( (linear): Linear(in_features=1, out_features=1, dtype=float32) (dropout): Dropout(p=0.5, axis=None, mode=upscale_in_train) )
-
full
(
aoa_config: dict[str:list[str]] | None = None,
**kwargs
)
full¶
-
Returns an iterator over the full, unsharded model parameters. The output parameters can be customized using the aoa_config argument.
Args: sharded_state_dict (ShardedStateDict):
The state dict containing parameter shards local to the current process.
- aoa_config (dict[str, list[str]] | None, optional):
-
AoA (Almost AllReduce) configuration. Default is None.
- kwargs:
-
Optional keyword arguments: - h_group: The horizontal communication group.
If using group communication, both h_group and v_group must be provided.
v_group: The vertical communication group.
process_group: The communication group in single-group setups (when h_group and v_group are not used).
num_splits (int): The number of splits to divide the parameters.
shard_idx (int): The index of the split handled by the current process. Default is 0.
-
- memory_growth_threshold (int): The memory threshold (in bytes) for controlling memory growth during parameter assembly.
-
Default is 8 * (2 ** 30), i.e., 8GB.
- Returns
-
An iterator over the full, unsharded model parameters, optionally filtered and customized according to aoa_config.
- Return type
-
Iterator
-
full_name
(
)
str
full_name¶
-
Full name for this layer, composed by name_scope + “/” + MyLayer.__class__.__name__
- Returns
-
str, full name of this layer.
Examples
>>> import paddle >>> class LinearNet(paddle.nn.Layer): ... def __init__(self): ... super().__init__(name_scope = "demo_linear_net") ... self._linear = paddle.nn.Linear(1, 1) ... ... def forward(self, x): ... return self._linear(x) ... >>> linear_net = LinearNet() >>> print(linear_net.full_name()) demo_linear_net_0
-
get_buffer
(
target: str
)
Tensor
get_buffer¶
-
Return the buffer given by
targetif it exists, otherwise throw an error.See the docstring for
get_sublayerfor a more detailed explanation of this method’s functionality as well as how to correctly specifytarget.- Parameters
-
target (str) – The fully-qualified string name of the buffer to look for.
- Returns
-
The buffer referenced by
target. - Return type
-
Tensor
-
get_parameter
(
target: str
)
Parameter
get_parameter¶
-
Return the parameter given by
targetif it exists, otherwise throw an error. :param target: The fully-qualified string name of the Parameter to look for. :type target: str- Returns
-
The Parameter referenced by
target. - Return type
-
Parameter
-
get_sublayer
(
target: str
)
Layer
get_sublayer¶
-
Return the submodule given by
targetif it exists, otherwise throw an error.- Parameters
-
target (str) – The fully-qualified string name of the submodule to look for.
- Returns
-
The sublayer referenced by
target. - Return type
-
Layer
-
get_submodule
(
target: str
)
Layer
get_submodule¶
-
Return the submodule given by
targetif it exists, otherwise throw an error.- Parameters
-
target (str) – The fully-qualified string name of the submodule to look for.
- Returns
-
The sublayer referenced by
target. - Return type
-
Layer
-
half
(
)
Self
half¶
-
Casts all floating point parameters and buffers to
halfdatatype.- Returns
-
self
- Return type
-
Module
-
load_dict
(
state_dict: Union[dict[str, paddle.Tensor], OrderedDict[str, Tensor]],
use_structured_name: bool = True
)
tuple[list[str], list[str]]
load_dict¶
-
Set parameters and persistable buffers from state_dict. All the parameters and buffers will be reset by the tensor in the state_dict
- Parameters
-
state_dict (dict) – Dict contains all the parameters and persistable buffers.
use_structured_name (bool, optional) – If true, use structured name as key, otherwise, use parameter or buffer name as key. Default: True.
- Returns
-
A list of str containing the missing keys unexpected_keys(list):A list of str containing the unexpected keys
- Return type
-
missing_keys(list)
Examples
>>> import paddle >>> emb = paddle.nn.Embedding(10, 10) >>> state_dict = emb.state_dict() >>> paddle.save(state_dict, "paddle_dy.pdparams") >>> para_state_dict = paddle.load("paddle_dy.pdparams") >>> emb.set_state_dict(para_state_dict)
-
load_state_dict
(
state_dict: Mapping[str, Any],
strict: bool = True,
assign: bool = False
)
load_state_dict¶
-
Copy parameters and buffers from
state_dictinto this module and its descendants.If
strictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.- Parameters
-
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:Trueassign (bool, optional) – When set to
False, the properties of the tensors in the current module are preserved whereas setting it toTruepreserves properties of the Tensors in the state dict. The only exception is therequires_gradfield ofParameterfor which the value from the module is preserved. Default:False
- Returns
-
-
-
missing_keysis a list of str containing any keys that are expected -
by this module but missing from the provided
state_dict.
-
-
-
unexpected_keysis a list of str containing the keys that are not -
expected by this module but present in the provided
state_dict.
-
-
- Return type
-
NamedTuplewithmissing_keysandunexpected_keysfields
-
modules
(
)
Iterator[Layer]
modules¶
-
Return an iterator over all modules in the network.
- Yields
-
Layer – a layer in the network.
-
named_buffers
(
prefix: str = '',
include_sublayers: bool = True,
remove_duplicate: bool = True
)
Iterable[tuple[str, Tensor]]
named_buffers¶
-
Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
- Parameters
-
prefix (str, optional) – Prefix to prepend to all buffer names. Default: ‘’.
include_sublayers (bool, optional) – Whether include the buffers of sublayers. If True, also include the named buffers from sublayers. Default: True.
remove_duplicate (bool, optional) – Whether to remove duplicated buffers in the result. Default: True.
- Yields
-
(string, Tensor) – Tuple of name and tensor
Examples
>>> import numpy as np >>> import paddle >>> fc1 = paddle.nn.Linear(10, 3) >>> buffer1 = paddle.to_tensor(np.array([0]).astype("float32")) >>> # register a tensor as buffer by specific `persistable` >>> fc1.register_buffer("buf_name_1", buffer1, persistable=True) >>> fc2 = paddle.nn.Linear(3, 10) >>> buffer2 = paddle.to_tensor(np.array([1]).astype("float32")) >>> # register a buffer by assigning an attribute with Tensor. >>> # The `persistable` can only be False by this way. >>> fc2.buf_name_2 = buffer2 >>> model = paddle.nn.Sequential(fc1, fc2) >>> # get all named buffers >>> for name, buffer in model.named_buffers(): ... print(name, buffer) 0.buf_name_1 Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, [0.]) 1.buf_name_2 Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, [1.])
-
named_children
(
)
Iterable[tuple[str, Layer]]
named_children¶
-
Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
- Yields
-
(string, Layer) – Tuple containing a name and child layer
Examples
>>> import paddle >>> linear1 = paddle.nn.Linear(10, 3) >>> linear2 = paddle.nn.Linear(3, 10, bias_attr=False) >>> model = paddle.nn.Sequential(linear1, linear2) >>> for prefix, layer in model.named_children(): ... print(prefix, layer) 0 Linear(in_features=10, out_features=3, dtype=float32) 1 Linear(in_features=3, out_features=10, dtype=float32)
-
named_modules
(
memo: Optional[set[paddle.nn.layer.layers.Layer]] = None,
prefix: str = '',
remove_duplicate: bool = True
)
named_modules¶
-
Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer. The duplicate sublayer will only be yielded once.
- Parameters
-
memo (set, optional) – The set to record duplicate sublayers. Default: None.
prefix (str, optional) – Prefix to prepend to all parameter names. Default: ‘’.
remove_duplicate (bool, optional) – Whether to remove duplicated sublayers in the result. Default: True.
- Yields
-
(string, Layer) – Tuple of name and Layer
-
named_parameters
(
prefix: str = '',
include_sublayers: bool = True,
remove_duplicate: bool = True
)
Iterable[tuple[str, Tensor]]
named_parameters¶
-
Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
- Parameters
-
prefix (str, optional) – Prefix to prepend to all parameter names. Default: ‘’.
include_sublayers (bool, optional) – Whether include the parameters of sublayers. If True, also include the named parameters from sublayers. Default: True.
remove_duplicate (bool, optional) – Whether to remove duplicated parameters in the result. Default: True.
- Yields
-
(string, Parameter) – Tuple of name and Parameter
Examples
>>> import paddle >>> paddle.seed(100) >>> fc1 = paddle.nn.Linear(10, 3) >>> fc2 = paddle.nn.Linear(3, 10, bias_attr=False) >>> model = paddle.nn.Sequential(fc1, fc2) >>> for name, param in model.named_parameters(): ... print(name, param) 0.weight Parameter containing: Tensor(shape=[10, 3], dtype=float32, place=Place(cpu), stop_gradient=False, [[ 0.07276392, -0.39791510, -0.66356444], [ 0.02143478, -0.18519843, -0.32485050], [-0.42249614, 0.08450919, -0.66838276], [ 0.38208580, -0.24303678, 0.55127048], [ 0.47745085, 0.62117910, -0.08336520], [-0.28653207, 0.47237599, -0.05868882], [-0.14385653, 0.29945642, 0.12832761], [-0.21237159, 0.38539791, -0.62760031], [ 0.02637231, 0.20621127, 0.43255770], [-0.19984481, -0.26259184, -0.29696006]]) 0.bias Parameter containing: Tensor(shape=[3], dtype=float32, place=Place(cpu), stop_gradient=False, [0., 0., 0.]) 1.weight Parameter containing: Tensor(shape=[3, 10], dtype=float32, place=Place(cpu), stop_gradient=False, [[ 0.01985580, -0.40268910, 0.41172385, -0.47249708, -0.09002256, -0.00533628, -0.52048630, 0.62360322, 0.20848787, -0.02033746], [ 0.58281910, 0.12841827, 0.12907702, 0.02325618, -0.07746267, 0.31950659, -0.37924835, -0.59209681, -0.11732036, -0.58378261], [-0.62100595, 0.22293305, 0.28229684, -0.03687060, -0.59323978, 0.08411229, 0.53275704, 0.40431368, 0.03171402, -0.17922515]])
-
named_sublayers
(
prefix: str = '',
include_self: bool = False,
layers_set: set[Layer] | None = None,
remove_duplicate: bool = True
)
Iterable[tuple[str, Layer]]
named_sublayers¶
-
Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer. The duplicate sublayer will only be yielded once.
- Parameters
-
prefix (str, optional) – Prefix to prepend to all parameter names. Default: ‘’.
include_self (bool, optional) – Whether include the Layer itself. Default: False.
layers_set (set, optional) – The set to record duplicate sublayers. Default: None.
remove_duplicate (bool, optional) – Whether to remove duplicated sublayers in the result. Default: True.
- Yields
-
(string, Layer) – Tuple of name and Layer
Examples
>>> import paddle >>> fc1 = paddle.nn.Linear(10, 3) >>> fc2 = paddle.nn.Linear(3, 10, bias_attr=False) >>> model = paddle.nn.Sequential(fc1, fc2) >>> for prefix, layer in model.named_sublayers(): ... print(prefix, layer) 0 Linear(in_features=10, out_features=3, dtype=float32) 1 Linear(in_features=3, out_features=10, dtype=float32) >>> l = paddle.nn.Linear(10, 3) >>> model = paddle.nn.Sequential(l, l) >>> for prefix, layer in model.named_sublayers(include_self=True, remove_duplicate=True): ... print(prefix, layer) Sequential( (0): Linear(in_features=10, out_features=3, dtype=float32) (1): Linear(in_features=10, out_features=3, dtype=float32) ) 0 Linear(in_features=10, out_features=3, dtype=float32) >>> l = paddle.nn.Linear(10, 3) >>> model = paddle.nn.Sequential(l, l) >>> for prefix, layer in model.named_sublayers(include_self=True, remove_duplicate=False): ... print(prefix, layer) Sequential( (0): Linear(in_features=10, out_features=3, dtype=float32) (1): Linear(in_features=10, out_features=3, dtype=float32) ) 0 Linear(in_features=10, out_features=3, dtype=float32) 1 Linear(in_features=10, out_features=3, dtype=float32)
-
parameters
(
include_sublayers: bool = True
)
list[paddle.Tensor]
parameters¶
-
Returns a list of all Parameters from current layer and its sub-layers.
- Parameters
-
include_sublayers (bool, optional) – Whether to return the parameters of the sublayer. If True, the returned list contains the parameters of the sublayer. Default: True.
- Returns
-
list, list of Tensor, a list of Parameters.
Examples
>>> import paddle >>> paddle.seed(100) >>> linear = paddle.nn.Linear(1, 1) >>> print(linear.parameters()) [Parameter containing: Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[0.18551230]]), Parameter containing: Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=False, [0.])]
-
register_buffer
(
name: str,
tensor: Tensor,
persistable: bool = True
)
None
register_buffer¶
-
Registers a tensor as buffer into the layer.
buffer is a non-trainable tensor and will not be updated by optimizer, but is necessary for evaluation and inference. For example, the mean and variance in BatchNorm layers. The registered buffer is persistable by default, and will be saved into state_dict alongside parameters. If set persistable=False, it registers a non-persistable buffer, so that it will not be a part of state_dict .
Buffers can be accessed as attributes using given names.
- Parameters
-
name (string) – name of the buffer. The buffer can be accessed from this layer using the given name
tensor (Tensor) – the tensor to be registered as buffer.
persistable (bool) – whether the buffer is part of this layer’s state_dict.
- Returns
-
None
Examples
>>> import numpy as np >>> import paddle >>> linear = paddle.nn.Linear(10, 3) >>> value = np.array([0]).astype("float32") >>> buffer = paddle.to_tensor(value) >>> linear.register_buffer("buf_name", buffer, persistable=True) >>> # get the buffer by attribute. >>> print(linear.buf_name) Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, [0.])
-
register_forward_hook
(
hook: Union[Callable[[Layer, Tensor, Tensor], Tensor], Callable[[Layer, Tensor, dict[str, Any], Tensor], Tensor]],
*,
prepend: bool = False,
with_kwargs: bool = False,
always_call: bool = False
)
HookRemoveHelper
register_forward_hook¶
-
Register a forward post-hook for Layer. The hook will be called after forward function has been computed.
It should have the following form, input and output of the hook is input and output of the Layer respectively. User can use forward post-hook to change the output of the Layer or perform information statistics tasks on the Layer.
hook(Layer, input, output) -> None or modified output
- Parameters
-
hook (function) – a function registered as a forward post-hook
prepend (bool) – If
True, the providedhookwill be fired before all existingforward_posthooks on thispaddle.nn.Layer. Default:Falsewith_kwargs (bool) – If
True, thehookwill be passed the kwargs given to the forward function. Default:Falsealways_call (bool) – If
Truethehookwill be run regardless of whether an exception is raised while calling the Module. Default:False
- Returns
-
HookRemoveHelper, a HookRemoveHelper object that can be used to remove the added hook by calling hook_remove_helper.remove() .
Examples
>>> import paddle >>> import numpy as np >>> # the forward_post_hook change the output of the layer: output = output * 2 >>> def forward_post_hook(layer, input, output): ... # user can use layer, input and output for information statistics tasks ... ... # change the output ... return output * 2 ... >>> linear = paddle.nn.Linear(13, 5) >>> # register the hook >>> forward_post_hook_handle = linear.register_forward_post_hook(forward_post_hook) >>> value1 = np.arange(26).reshape(2, 13).astype("float32") >>> in1 = paddle.to_tensor(value1) >>> out0 = linear(in1) >>> # remove the hook >>> forward_post_hook_handle.remove() >>> out1 = linear(in1) >>> # hook change the linear's output to output * 2, so out0 is equal to out1 * 2. >>> assert (out0.numpy() == (out1.numpy()) * 2).any()
-
register_forward_post_hook
(
hook: Union[Callable[[Layer, Tensor, Tensor], Tensor], Callable[[Layer, Tensor, dict[str, Any], Tensor], Tensor]],
*,
prepend: bool = False,
with_kwargs: bool = False,
always_call: bool = False
)
HookRemoveHelper
register_forward_post_hook¶
-
Register a forward post-hook for Layer. The hook will be called after forward function has been computed.
It should have the following form, input and output of the hook is input and output of the Layer respectively. User can use forward post-hook to change the output of the Layer or perform information statistics tasks on the Layer.
hook(Layer, input, output) -> None or modified output
- Parameters
-
hook (function) – a function registered as a forward post-hook
prepend (bool) – If
True, the providedhookwill be fired before all existingforward_posthooks on thispaddle.nn.Layer. Default:Falsewith_kwargs (bool) – If
True, thehookwill be passed the kwargs given to the forward function. Default:Falsealways_call (bool) – If
Truethehookwill be run regardless of whether an exception is raised while calling the Module. Default:False
- Returns
-
HookRemoveHelper, a HookRemoveHelper object that can be used to remove the added hook by calling hook_remove_helper.remove() .
Examples
>>> import paddle >>> import numpy as np >>> # the forward_post_hook change the output of the layer: output = output * 2 >>> def forward_post_hook(layer, input, output): ... # user can use layer, input and output for information statistics tasks ... ... # change the output ... return output * 2 ... >>> linear = paddle.nn.Linear(13, 5) >>> # register the hook >>> forward_post_hook_handle = linear.register_forward_post_hook(forward_post_hook) >>> value1 = np.arange(26).reshape(2, 13).astype("float32") >>> in1 = paddle.to_tensor(value1) >>> out0 = linear(in1) >>> # remove the hook >>> forward_post_hook_handle.remove() >>> out1 = linear(in1) >>> # hook change the linear's output to output * 2, so out0 is equal to out1 * 2. >>> assert (out0.numpy() == (out1.numpy()) * 2).any()
-
register_forward_pre_hook
(
hook: Union[Callable[[Layer, Tensor], Tensor], Callable[[Layer, Tensor, dict[str, Any]], tuple[paddle.Tensor, dict[str, Any]]]],
*,
prepend: bool = False,
with_kwargs: bool = False
)
HookRemoveHelper
register_forward_pre_hook¶
-
Register a forward pre-hook for Layer. The hook will be called before forward function has been computed.
It should have the following form, input of the hook is input of the Layer, hook can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). User can use forward pre-hook to change the input of the Layer or perform information statistics tasks on the Layer.
hook(Layer, input) -> None or modified input
- Parameters
-
hook (function) – a function registered as a forward pre-hook
prepend (bool) – If
True, the providedhookwill be fired before all existingforward_prehooks on thispaddle.nn.Layer. Default:Falsewith_kwargs (bool) – If true, the
hookwill be passed the kwargs given to the forward function. Default:False
- Returns
-
HookRemoveHelper, a HookRemoveHelper object that can be used to remove the added hook by calling hook_remove_helper.remove() .
Examples
>>> import paddle >>> import numpy as np >>> # the forward_pre_hook change the input of the layer: input = input * 2 >>> def forward_pre_hook(layer, input): ... # user can use layer and input for information statistics tasks ... ... # change the input ... input_return = (input[0] * 2) ... return input_return ... >>> linear = paddle.nn.Linear(13, 5) >>> # register the hook >>> forward_pre_hook_handle = linear.register_forward_pre_hook(forward_pre_hook) >>> value0 = np.arange(26).reshape(2, 13).astype("float32") >>> in0 = paddle.to_tensor(value0) >>> out0 = linear(in0) >>> # remove the hook >>> forward_pre_hook_handle.remove() >>> value1 = value0 * 2 >>> in1 = paddle.to_tensor(value1) >>> out1 = linear(in1) >>> # hook change the linear's input to input * 2, so out0 is equal to out1. >>> assert (out0.numpy() == out1.numpy()).any()
-
register_module
(
name: str,
module: paddle.nn.layer.layers.Layer | None
)
None
register_module¶
-
Adds a sub layer instance. Added layer can be accessed by self.name
- Parameters
-
name (str) – name of this sublayer.
layer (Layer) – an instance of Layer.
- Returns
-
None
-
register_parameter
(
name: str,
param: paddle.base.framework.Parameter | None
)
None
register_parameter¶
-
Adds a Parameter instance. Added parameter can be accessed by self.name
- Parameters
-
name (str) – name of this submodule.
parameter (Optional[Parameter]) – an instance of Parameter.
- Returns
-
None
-
requires_grad_
(
requires_grad: bool = True
)
Self
requires_grad_¶
-
Change if autograd should record operations on parameters in this layer.
- Parameters
-
requires_grad (bool) – whether autograd should record operations on parameters in this layer. Default:
True. - Returns
-
self
- Return type
-
Layer
-
set_dict
(
state_dict: Union[dict[str, paddle.Tensor], OrderedDict[str, Tensor]],
use_structured_name: bool = True
)
tuple[list[str], list[str]]
set_dict¶
-
Set parameters and persistable buffers from state_dict. All the parameters and buffers will be reset by the tensor in the state_dict
- Parameters
-
state_dict (dict) – Dict contains all the parameters and persistable buffers.
use_structured_name (bool, optional) – If true, use structured name as key, otherwise, use parameter or buffer name as key. Default: True.
- Returns
-
A list of str containing the missing keys unexpected_keys(list):A list of str containing the unexpected keys
- Return type
-
missing_keys(list)
Examples
>>> import paddle >>> emb = paddle.nn.Embedding(10, 10) >>> state_dict = emb.state_dict() >>> paddle.save(state_dict, "paddle_dy.pdparams") >>> para_state_dict = paddle.load("paddle_dy.pdparams") >>> emb.set_state_dict(para_state_dict)
-
set_state_dict
(
state_dict: Union[dict[str, paddle.Tensor], OrderedDict[str, Tensor]],
use_structured_name: bool = True
)
tuple[list[str], list[str]]
set_state_dict¶
-
Set parameters and persistable buffers from state_dict. All the parameters and buffers will be reset by the tensor in the state_dict
- Parameters
-
state_dict (dict) – Dict contains all the parameters and persistable buffers.
use_structured_name (bool, optional) – If true, use structured name as key, otherwise, use parameter or buffer name as key. Default: True.
- Returns
-
A list of str containing the missing keys unexpected_keys(list):A list of str containing the unexpected keys
- Return type
-
missing_keys(list)
Examples
>>> import paddle >>> emb = paddle.nn.Embedding(10, 10) >>> state_dict = emb.state_dict() >>> paddle.save(state_dict, "paddle_dy.pdparams") >>> para_state_dict = paddle.load("paddle_dy.pdparams") >>> emb.set_state_dict(para_state_dict)
-
set_sublayer
(
target: str,
layer: Layer,
strict: bool = False
)
None
set_sublayer¶
-
Set the sublayer given by
targetif it exists, otherwise throw an error.- Parameters
-
target (str) – The fully-qualified string name of the sublayer to look for.
layer (Layer) – The layer to set the sublayer to.
strict (bool) – If
False, the method will replace an existing sublayer or create a new sublayer if the parent module exists. IfTrue, the method will only attempt to replace an existing sublayer and throw an error if the sublayer doesn’t already exist.
-
set_submodule
(
target: str,
layer: Layer,
strict: bool = False
)
None
set_submodule¶
-
Set the sublayer given by
targetif it exists, otherwise throw an error.- Parameters
-
target (str) – The fully-qualified string name of the sublayer to look for.
layer (Layer) – The layer to set the sublayer to.
strict (bool) – If
False, the method will replace an existing sublayer or create a new sublayer if the parent module exists. IfTrue, the method will only attempt to replace an existing sublayer and throw an error if the sublayer doesn’t already exist.
-
sharded_state_dict
(
structured_name_prefix: str = ''
)
Union[dict[str, paddle.distributed.flex_checkpoint.dcp.sharded_weight.ShardedWeight], OrderedDict[str, ShardedWeight]]
sharded_state_dict¶
-
Recursively builds a sharded state dictionary for the model and its sub-layers.
- Parameters
-
structured_name_prefix – Prefix to prepend to all tensor names for hierarchical naming.
- Returns
-
Dictionary mapping tensor names to ShardedWeight. The dictionary contains both the current layer’s parameters and all sub-layer parameters.
-
state_dict
(
*args: Any,
**kwargs: Any
)
Union[dict[str, paddle.Tensor], OrderedDict[str, Tensor]]
state_dict¶
-
Get all parameters and persistable buffers of current layer and its sub-layers. And set them into a dict
- Parameters
-
destination (dict, optional) – If provide, all the parameters and persistable buffers will be set to this dict . Default: None.
include_sublayers (bool, optional) – If true, also include the parameters and persistable buffers from sublayers. Default: True.
use_hook (bool, optional) – If true, the operations contained in _state_dict_hooks will be appended to the destination. Default: True.
keep_vars (bool, optional) – If false, the returned tensors in the state dict are detached from autograd. Default: True.
- Returns
-
a dict contains all the parameters and persistable buffers.
- Return type
-
dict
Examples
>>> import paddle >>> emb = paddle.nn.Embedding(10, 10) >>> state_dict = emb.state_dict() >>> paddle.save(state_dict, "paddle_dy.pdparams")
-
sublayers
(
include_self: bool = False
)
list[paddle.nn.layer.layers.Layer]
sublayers¶
-
Returns a list of sub layers.
- Parameters
-
include_self (bool, optional) – Whether return self as sublayers. Default: False.
- Returns
-
list of Layer, a list of sub layers.
Examples
>>> import paddle >>> class MyLayer(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self._linear = paddle.nn.Linear(1, 1) ... self._dropout = paddle.nn.Dropout(p=0.5) ... ... def forward(self, input): ... temp = self._linear(input) ... temp = self._dropout(temp) ... return temp >>> mylayer = MyLayer() >>> print(mylayer.sublayers()) [Linear(in_features=1, out_features=1, dtype=float32), Dropout(p=0.5, axis=None, mode=upscale_in_train, inplace=False)]
-
to
(
device: PlaceLike | None = None,
dtype: DTypeLike | None = None,
blocking: bool | None = None,
non_blocking: bool | None = None
)
Self
to¶
-
Cast the parameters and buffers of Layer by the give device, dtype and blocking.
- Parameters
-
device (str|paddle.CPUPlace()|paddle.CUDAPlace()|paddle.CUDAPinnedPlace()|paddle.XPUPlace()|None, optional) – The device of the Layer which want to be stored.
None (If) –
string (the device is the same with the original Tensor. If device is) –
cpu (it can be) –
xpu:x (gpu:x and) –
the (where x is) –
Default (index of the GPUs or XPUs.) – None.
dtype (str|numpy.dtype|paddle.dtype|None, optional) – The type of the data. If None, the dtype is the same with the original Tensor. Default: None.
blocking (bool|None, optional) – If False and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. If None, the blocking is set True. Default: None.
non_blocking (bool|None, optional) – If True and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. If None, the non_blocking is set False. Default: None.
- Returns
-
self
Examples
>>> import paddle >>> paddle.seed(2023) >>> linear=paddle.nn.Linear(2, 2) >>> linear.weight >>> print(linear.weight) Parameter containing: Tensor(shape=[2, 2], dtype=float32, place=Place(gpu:0), stop_gradient=False, [[ 0.89611185, 0.04935038], [-0.58883440, 0.99266374]]) >>> linear.to(dtype='float64') >>> linear.weight >>> print(linear.weight) Parameter containing: Tensor(shape=[2, 2], dtype=float64, place=Place(gpu:0), stop_gradient=False, [[ 0.89611185, 0.04935038], [-0.58883440, 0.99266374]]) >>> linear.to(device='cpu') >>> linear.weight >>> print(linear.weight) Parameter containing: Tensor(shape=[2, 2], dtype=float64, place=Place(cpu), stop_gradient=False, [[ 0.89611185, 0.04935038], [-0.58883440, 0.99266374]]) >>> >>> linear.to(device=paddle.CUDAPinnedPlace(), blocking=False) >>> linear.weight >>> print(linear.weight) Parameter containing: Tensor(shape=[2, 2], dtype=float64, place=Place(gpu_pinned), stop_gradient=False, [[ 0.89611185, 0.04935038], [-0.58883440, 0.99266374]])
-
to_static_state_dict
(
destination: Optional[Union[dict[str, paddle.Tensor], OrderedDict[str, Tensor]]] = None,
include_sublayers: bool = True,
structured_name_prefix: str = '',
use_hook: bool = True,
keep_vars: bool = True
)
Union[dict[str, paddle.Tensor], OrderedDict[str, Tensor]]
to_static_state_dict¶
-
Get all parameters and buffers of current layer and its sub-layers. And set them into a dict
- Parameters
-
destination (dict, optional) – If provide, all the parameters and persistable buffers will be set to this dict . Default: None.
include_sublayers (bool, optional) – If true, also include the parameters and persistable buffers from sublayers. Default: True.
use_hook (bool, optional) – If true, the operations contained in _state_dict_hooks will be appended to the destination. Default: True.
keep_vars (bool, optional) – If false, the returned tensors in the state dict are detached from autograd. Default: True.
- Returns
-
dict, a dict contains all the parameters and persistable buffers.
Examples
>>> import paddle >>> emb = paddle.nn.Embedding(10, 10) >>> state_dict = emb.to_static_state_dict() >>> paddle.save( state_dict, "paddle_dy.pdparams")
-
train
(
mode: bool = True
)
Self
train¶
-
Sets this Layer and all its sublayers to training mode. This only effects certain modules like Dropout and BatchNorm.
- Returns
-
self
- Return type
-
Layer
Examples
>>> import paddle >>> paddle.seed(100) >>> class MyLayer(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self._linear = paddle.nn.Linear(1, 1) ... self._dropout = paddle.nn.Dropout(p=0.5) ... ... def forward(self, input): ... temp = self._linear(input) ... temp = self._dropout(temp) ... return temp ... >>> x = paddle.randn([10, 1], 'float32') >>> mylayer = MyLayer() >>> mylayer.eval() # set mylayer._dropout to eval mode >>> out = mylayer(x) >>> mylayer.train() # set mylayer._dropout to train mode >>> out = mylayer(x) >>> print(out) Tensor(shape=[10, 1], dtype=float32, place=Place(cpu), stop_gradient=False, [[-3.44879317], [ 0. ], [ 0. ], [-0.73825276], [ 0. ], [ 0. ], [ 0.64444798], [-3.22185946], [ 0. ], [-0.68077987]])
-
type
(
dst_type: paddle.dtype | str
)
Self
type¶
-
Casts all parameters and buffers to
dst_type.- Parameters
-
dtype (str|paddle.dtype) – target data type of layer. If set str, it can be “bool”, “bfloat16”, “float16”, “float32”, “float64”, “int8”, “int16”, “int32”, “int64”, “uint8”, “complex64”, “complex128”. Default: None
- Returns
-
self
- Return type
-
Layer
-
xpu
(
device: int | PlaceLike | None = None
)
Self
xpu¶
-
Move all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the layer will live on XPU while being optimized.
- Parameters
-
device (int, optional) – if specified, all parameters will be copied to that device.
- Returns
-
self
- Return type
-
Layer
-
zero_grad
(
set_to_none: bool = True
)
None
zero_grad¶
-
Reset gradients of all model parameters.
- Parameters
-
set_to_none (bool) – instead of setting to zero, set the grads to None. Currently, set_to_none=True
supported. (is not fully) –
-
forward
(
query: Tensor,
key: Tensor,
value: Tensor,
key_padding_mask: Optional[Tensor] = None,
need_weights: bool = True,
attn_mask: Optional[Tensor] = None,
average_attn_weights: bool = True,
is_causal: bool = False
)
tuple[paddle.Tensor, paddle.Tensor | None]
forward¶
-
Forward pass of the MultiheadAttention layer.
Note
If
need_weightsisFalse, this api will fallback to native math implementation, otherwise it will callpaddle.compat.nn.functional.scaled_dot_product_attentionto compute the attention score.To achieve better performance, explicitly set
need_weights=False, and setis_causal=Trueif the attn_mask is the causal mask.- Parameters
-
query (Tensor) – The query embeddings. Shape depends on batch_first. If batch_first is False, shape is [target_seq_len, batch_size, embed_dim]. If batch_first is True, shape is [batch_size, target_seq_len, embed_dim].
key (Tensor) – The key embeddings. Shape depends on batch_first. If batch_first is False, shape is [source_seq_len, batch_size, kdim]. If batch_first is True, shape is [batch_size, source_seq_len, kdim].
value (Tensor) – The value embeddings. Shape depends on batch_first. If batch_first is False, shape is [source_seq_len, batch_size, vdim]. If batch_first is True, shape is [batch_size, source_seq_len, vdim].
key_padding_mask (Tensor, optional) – If specified, a mask indicating which elements within key to ignore for the purpose of attention (i.e. treat as “padding”). Can be a boolean mask (True indicates padding) or a float mask. Shape is [batch_size, source_seq_len]. Default: None.
need_weights (bool, optional) – Indicate whether to return the attention weights. Default: True.
attn_mask (Tensor, optional) – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all batches while a 3D mask allows different masks for the entries in the batch. Shape is [target_seq_len, source_seq_len] or [batch_size * num_heads, target_seq_len, source_seq_len]. Default: None.
average_attn_weights (bool, optional) – If True, indicates that the returned attn_weights should be averaged across heads. Default: True.
is_causal (bool, optional) – If True, implies that a causal mask is applied to the attention implementation. If attn_mask is None and is_causal is True, a causal mask is automatically created and used in the attention computation. Default: False.
- Returns
-
attn_output (Tensor): The output of the attention mechanism. Shape matches query (based on batch_first).
attn_output_weights (Tensor|None): The attention weights. Returns None if need_weights is False. Shape is [batch_size, target_seq_len, source_seq_len] if average_attn_weights is True. If average_attn_weights is False, shape is [batch_size, num_heads, target_seq_len, source_seq_len].
- Return type
-
tuple[Tensor, Tensor|None]
