FusedFeedForward¶
- class paddle.incubate.nn. FusedFeedForward ( d_model, dim_feedforward, dropout_rate=0.1, epsilon=1e-05, activation='relu', act_dropout_rate=None, normalize_before=False, linear1_weight_attr=None, linear1_bias_attr=None, linear2_weight_attr=None, linear2_bias_attr=None, ln1_scale_attr=None, ln1_bias_attr=None, ln2_scale_attr=None, ln2_bias_attr=None, nranks=1, ring_id=- 1, name=None ) [source]
- 
         - Parameters
- 
           - d_model (int) – The expected feature size in the input and output. 
- dim_feedforward (int) – The hidden layer size. 
- dropout_rate (float, optional) – The dropout probability used in pre-process and post-precess. Default 0.1 
- epsilon (float, optional) – he small value added to the variance to prevent division by zero. Default: 1e-05. 
- activation (str, optional) – The activation function. Default relu. 
- act_dropout_rate (float, optional) – The dropout probability after activition. If None, use the value of dropout_rate. Default None 
- normalize_before (bool, optional) – Indicate whether to put layer normalization into, preprocessing or postprocessing. Default False 
- linear1_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN first linear. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- linear1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN first linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- linear2_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN second linear. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- linear2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN second linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- ln1_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN pre_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- ln1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN pre_layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- ln2_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN post_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in - ParamAttr.
- ln2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in - ParamAttr.
- nranks (int, optional) – Distributed tensor model parallel nranks. Default is 1, means not using tensor parallel. 
- ring_id (int, optional) – For distributed tensor model parallel. Default is -1, means not using tensor parallel. 
- name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name. 
 
 Examples # required: gpu import paddle from paddle.incubate.nn import FusedFeedForward fused_feedforward_layer = FusedFeedForward(8, 8) x = paddle.rand((1, 8, 8)) out = fused_feedforward_layer(x) print(out.numpy().shape) # (1, 8, 8) - 
            
           forward
           (
           src, 
           cache=None
           )
           forward¶
- 
           Defines the computation performed at every call. Should be overridden by all subclasses. - Parameters
- 
             - *inputs (tuple) – unpacked tuple arguments 
- **kwargs (dict) – unpacked dict arguments 
 
 
 - 
            
           extra_repr
           (
           )
           extra_repr¶
- 
           Extra representation of this layer, you can have custom implementation of your own layer. 
 
