shard_layer
- paddle.distributed. shard_layer ( layer: Layer, process_mesh: ProcessMesh, shard_fn: Callable[[str, Layer, ProcessMesh], None] | None = None, input_fn: Callable[[Any, ProcessMesh], list[Tensor]] | None = None, output_fn: Callable[[Any, ProcessMesh], list[Tensor]] | None = None ) Layer [source]
- 
         Converts all layer’s parameters to DistTensor parameters according to the shard_fn specified. It could also control the conversion of input or output of the layer by specifying the input_fn and output_fn. (i.e. convert the input to paddle.Tensor with distributed attributes, convert output back to paddle.Tensor without distributed attributes.) The shard_fn should have the following signature: def shard_fn(layer_name, layer, process_mesh) -> None The input_fn should have the following signature: def input_fn(inputs, process_mesh) -> list(paddle.Tensor) In general, the type of input_fn return value is paddle.Tensor with distributed attributes. The output_fn should have the following signature: def output_fn(outputs, process_mesh) -> list(paddle.Tensor) In general, the type of output_fn return value is paddle.Tensor with distributed attributes. - Parameters
- 
           - layer (paddle.nn.Layer) – The Layer object to be shard. 
- process_mesh (paddle.distributed.ProcessMesh) – The ProcessMesh information to be place the input layer. 
- shard_fn (Callable) – The function to shard layer parameters across the process_mesh. If not specified, by default we replicate all parameters of the layer across the process_mesh. 
- input_fn (Callable) – Specify how the input of the layer is sharded. The input_fn will be registered for the Layer as a forward pre-hook. By default we do not shard the input. 
- output_fn (Callable) – Specify how the output of the layer is sharded or convert it back to paddle.Tensor without distributed attributes. The output_fn will be registered for the Layer as forward post-hook. By default we do not shard or convert the output. 
 
- Returns
- 
           
           - A layer that contains parameters/buffers
- 
             that are all paddle.Tensor with distributed attributes. 
 
- Return type
- 
           Layer 
 Examples >>> import paddle >>> import paddle.distributed as dist >>> mesh = dist.ProcessMesh([0, 1], dim_names=["x"]) >>> class MLP(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... self.fc1 = paddle.nn.Linear(8, 8) ... self.fc2 = paddle.nn.Linear(8, 8) ... ... def forward(self, input): ... return self.fc2(self.fc1(input)) >>> def shard_fn(layer_name, layer, process_mesh): ... if layer_name == 'fc1': ... layer.weight = dist.shard_tensor(layer.weight, process_mesh, [dist.Shard(0)]) >>> layer = MLP() >>> layer = dist.shard_layer(layer, mesh, shard_fn) >>> print(layer) >>> # This case need to be executed in multi-card environment >>> # export CUDA_VISIBLE_DEVICES=0,1 >>> # python -m paddle.distributed.launch {test_case}.py 
