fused_ec_moe

paddle.incubate.nn.functional. fused_ec_moe ( x, gate, bmm0_weight, bmm0_bias, bmm1_weight, bmm1_bias, act_type ) [source]

Applies fused ec_moe kernel. This method requires SM_ARCH in sm75, sm80, sm86.

Parameters
  • x (Tensor) – the input Tensor. Its shape is [bsz, seq_len, d_model].

  • gate (Tensor) – the gate Tensor to choose expert. Its shape is [bsz, seq_len, e].

  • bmm0_weight (Tensor) – the first batch matrix matmul weight. Its shape is [e, d_model, d_feed_forward].

  • bmm0_bias (Tensor) – the first batch matrix matmul bias. Its shape is [e, 1, d_feed_forward].

  • bmm1_weight (Tensor) – the second batch matrix matmul weight. Its shape is [e, d_model, d_feed_forward].

  • bmm1_bias (Tensor) – the second batch matrix matmul bias. Its shape is [e, 1, d_feed_forward].

  • act_type (string) – the Activation Type. Currently only support gelu, relu.

Returns

the output Tensor.

Return type

Tensor

Examples

>>> 
>>> import paddle
>>> from paddle.incubate.nn.functional import fused_ec_moe

>>> paddle.set_device('gpu')
>>> x = paddle.randn([10, 128, 1024])
>>> gate = paddle.randn([10, 128, 8])
>>> bmm0_weight = paddle.randn([8, 1024, 4096])
>>> bmm0_bias = paddle.randn([8, 1024, 4096])
>>> bmm1_weight = paddle.randn([8, 1024, 4096])
>>> bmm1_bias = paddle.randn([8, 1024, 4096])
>>> out = fused_ec_moe(x, gate, bmm0_weight, bmm0_bias, bmm1_weight, bmm1_bias, act_type="gelu")
>>> print(out.shape)
[10, 128, 1024]