to_static

to_static ( layer, loader, loss=None, optimizer=None, strategy=None )

将带有分布式切分信息的动态图 layer 转换为静态图分布式模型, 可在静态图模式下进行分布式训练;同时将动态图下所使用的数据迭代器 loader 转换为静态图分布式训练所使用的数据迭代器。

paddle.distributed.to_static 返回 DistModel 实例和 DistributedDataLoader 实例。 DistModel 实例包含了转换后的静态图模型,同时提供了训练、评估和预测的接口。 DistributedDataLoader 实例用于在静态图分布式训练中加载数据。

参数

  • layer (paddle.nn.Layer) - 带有分布式信息,可在动态图模式下进行分布式训练的模型。

  • loader (paddle.io.DataLoader) - 动态图训练时所使用的数据迭代器。

  • loss (Loss|Callable|None, 可选) - 损失函数。需要训练或者评估模型时,该参数必须设定。

  • optimizer (Optimizer|None, 可选) - 优化器。训练模型时,该参数必须设定。

  • strategy (Strategy|None, 可选) - 分布式训练的配置,用于设置混合精度训练、分布式优化策略等。

返回

DistModel: 用于静态图分布式训练的模型,通过 __call__ 方法进行训练、评估和预测。需要执行训练、评估或预测时,需要先使用 DistModel 实例的 train()/eval()/predict() 方法将其转换为对应的模式。 DistModel 实例的默认模式会根据 paddle.distributed.to_static 的输入设置,当 lossoptimizer 均给定时,默认模式为 train;当 optimizer 为空时,默认模式为 eval;当 lossoptimizer 均为空时,默认模式为 predict

DistributedDataLoader: 用于静态图分布式训练的数据迭代器,和 paddle.io.DataLoader 用法一致。

代码示例

>>> import numpy as np
>>> import paddle
>>> import paddle.distributed as dist
>>> from paddle import nn
>>> from paddle.distributed import Replicate, Shard

>>> BATCH_SIZE = 4
>>> BATCH_NUM = 4
>>> IMAGE_SIZE = 16
>>> CLASS_NUM = 8
>>> class RandomDataset(paddle.io.Dataset):
...     def __init__(self, images, labels, num_samples):
...         self.images = images
...         self.labels = labels
...         self.num_samples = num_samples
...     def __getitem__(self, idx):
...         return self.images[idx], self.labels[idx]
...     def __len__(self):
...         return self.num_samples

>>> class DemoNet(nn.Layer):
...     def __init__(self, mesh):
...         super().__init__()
...         self._mesh = mesh
...         self.linear_0 = nn.Linear(IMAGE_SIZE, IMAGE_SIZE)
...         self.linear_1 = nn.Linear(IMAGE_SIZE, CLASS_NUM)
...         self.relu = nn.ReLU()
...         # shard the weights of this layer
...         self.linear_0.weight = dist.shard_tensor(
...             self.linear_0.weight,
...             self._mesh,
...             [Shard(1)],
...             stop_gradient=False,
...         )
...         self.linear_1.weight = dist.shard_tensor(
...             self.linear_1.weight,
...             self._mesh,
...             [Shard(0)],
...             stop_gradient=False,
...         )
...     def forward(self, x):
...         out = self.linear_0(x)
...         out = self.relu(out)
...         out = self.linear_1(out)
...         return out

>>> images = np.random.rand(BATCH_SIZE, IMAGE_SIZE).astype('float32')
>>> labels = np.random.rand(BATCH_SIZE, CLASS_NUM).astype('float32')
>>> dataset = RandomDataset(images, labels, BATCH_SIZE)
>>> loader = paddle.io.DataLoader(dataset, batch_size=BATCH_SIZE)

>>> mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
>>> layer = DemoNet(mesh)
>>> opt = paddle.optimizer.SGD(
...     learning_rate=0.1, parameters=layer.parameters()
... )
>>> loss_fn = nn.MSELoss()

>>> dist_model, dist_loader = dist.to_static(
...     layer, loader, loss_fn, opt
... )

>>> # training
>>> dist_model.train()
>>> for batch_id, (image, label) in enumerate(dist_loader()):
...     # in train mode, executing the __call__ method will
...     # update the parameters of the model and return the
...     # loss
...     loss = dist_model(image, label)

>>> # evaluation
>>> dist_model.eval()
>>> for batch_id, (image, label) in enumerate(dist_loader()):
...     # in eval mode, executing the __call__ method will
...     # return the loss
...     loss = dist_model(image, label)

>>> # prediction
>>> dist_model.predict()
>>> for batch_id, (image, label) in enumerate(dist_loader()):
...     # in predict mode, executing the __call__ method will
...     # return a dict that contains the outputs of the model,
...     # where the value of "out0" is the first output.
...     outs = dist_model(image)

>>> # This case need to be excuted in multi-card environment
>>> # export CUDA_VISIBLE_DEVICES=0,1
>>> # python -m paddle.distributed.launch {test_case}.py