FusedFeedForward
- class paddle.incubate.nn. FusedFeedForward ( d_model: int, dim_feedforward: int, dropout_rate: float = 0.1, epsilon: float = 1e-05, activation: str = 'relu', act_dropout_rate: float | None = None, normalize_before: bool = False, linear1_weight_attr: ParamAttrLike | None = None, linear1_bias_attr: ParamAttrLike | None = None, linear2_weight_attr: ParamAttrLike | None = None, linear2_bias_attr: ParamAttrLike | None = None, ln1_scale_attr: ParamAttrLike | None = None, ln1_bias_attr: ParamAttrLike | None = None, ln2_scale_attr: ParamAttrLike | None = None, ln2_bias_attr: ParamAttrLike | None = None, nranks: int = 1, ring_id: int = -1, name: str | None = None ) [source]
- 
         - Parameters
- 
           - d_model (int) – The expected feature size in the input and output. 
- dim_feedforward (int) – The hidden layer size. 
- dropout_rate (float, optional) – The dropout probability used in pre-process and post-precess. Default 0.1 
- epsilon (float, optional) – he small value added to the variance to prevent division by zero. Default: 1e-05. 
- activation (str, optional) – The activation function. Default relu. 
- act_dropout_rate (float, optional) – The dropout probability after activation. If None, use the value of dropout_rate. Default None 
- normalize_before (bool, optional) – Indicate whether to put layer normalization into, preprocessing or postprocessing. Default False 
- linear1_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN first linear. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- linear1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN first linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- linear2_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN second linear. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- linear2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN second linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- ln1_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN pre_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- ln1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN pre_layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- ln2_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN post_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- ln2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- nranks (int, optional) – Distributed tensor model parallel nranks. Default is 1, means not using tensor parallel. 
- ring_id (int, optional) – For distributed tensor model parallel. Default is -1, means not using tensor parallel. 
- name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to api_guide_Name. 
 
 Examples >>> >>> import paddle >>> from paddle.incubate.nn import FusedFeedForward >>> paddle.device.set_device('gpu') >>> fused_feedforward_layer = FusedFeedForward(8, 8) >>> x = paddle.rand((1, 8, 8)) >>> out = fused_feedforward_layer(x) >>> print(out.shape) [1, 8, 8] - 
            
           forward
           (
           src: Tensor, 
           cache: Tensor | None = None
           ) 
            Tensor
           forward¶
- 
           Defines the computation performed at every call. Should be overridden by all subclasses. - Parameters
- 
             - *inputs (tuple) – unpacked tuple arguments 
- **kwargs (dict) – unpacked dict arguments 
 
 
 - 
            
           extra_repr
           (
           ) 
            str
           extra_repr¶
- 
           Extra representation of this layer, you can have custom implementation of your own layer. 
 
