ParallelEnv

class paddle.fluid.dygraph.ParallelEnv[source]
Notes:

The old class name was Env and will be deprecated. Please use new class name ParallelEnv.

This class is used to obtain the environment variables required for the parallel execution of dynamic graph model.

The dynamic graph parallel mode needs to be started using paddle.distributed.launch. By default, the related environment variable is automatically configured by this module.

This class is generally used in with fluid.dygraph.DataParallel to configure dynamic graph models to run in parallel.

Examples

# This example needs to run with paddle.distributed.launch, The usage is:
#   python -m paddle.distributed.launch --selected_gpus=0,1 example.py
# And the content of `example.py` is the code of following example.

import numpy as np
import paddle.fluid as fluid
import paddle.fluid.dygraph as dygraph
from paddle.fluid.optimizer import AdamOptimizer
from paddle.fluid.dygraph.nn import Linear
from paddle.fluid.dygraph.base import to_variable

place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
with fluid.dygraph.guard(place=place):

    # prepare the data parallel context
    strategy=dygraph.prepare_context()

    linear = Linear(1, 10, act="softmax")
    adam = fluid.optimizer.AdamOptimizer()

    # make the module become the data parallelism module
    linear = dygraph.DataParallel(linear, strategy)

    x_data = np.random.random(size=[10, 1]).astype(np.float32)
    data = to_variable(x_data)

    hidden = linear(data)
    avg_loss = fluid.layers.mean(hidden)

    # scale the loss according to the number of trainers.
    avg_loss = linear.scale_loss(avg_loss)

    avg_loss.backward()

    # collect the gradients of trainers.
    linear.apply_collective_grads()

    adam.minimize(avg_loss)
    linear.clear_gradients()
nranks

The number of trainers, generally refers to the number of GPU cards used in training.

Its value is equal to the value of the environment variable PADDLE_TRAINERS_NUM. The default value is 1.

Examples

# execute this command in terminal: export PADDLE_TRAINERS_NUM=4
import paddle.fluid as fluid

env = fluid.dygraph.ParallelEnv()
print("The nranks is %d" % env.nranks)
# The nranks is 4
local_rank

The current trainer number.

Its value is equal to the value of the environment variable PADDLE_TRAINER_ID. The default value is 0.

Examples

# execute this command in terminal: export PADDLE_TRAINER_ID=0
import paddle.fluid as fluid

env = fluid.dygraph.ParallelEnv()
print("The local rank is %d" % env.local_rank)
# The local rank is 0
dev_id

The ID of selected GPU card for parallel training.

Its value is equal to the value of the environment variable FLAGS_selected_gpus. The default value is 0.

Examples

# execute this command in terminal: export FLAGS_selected_gpus=1
import paddle.fluid as fluid

env = fluid.dygraph.ParallelEnv()
print("The device id are %d" % env.dev_id)
# The device id are 1
current_endpoint

The endpoint of current trainer, it is in the form of (node IP + port).

Its value is equal to the value of the environment variable PADDLE_CURRENT_ENDPOINT. The default value is “”.

Examples

# execute this command in terminal: export PADDLE_CURRENT_ENDPOINT=127.0.0.1:6170
import paddle.fluid as fluid

env = fluid.dygraph.ParallelEnv()
print("The current endpoint are %s" % env.current_endpoint)
# The current endpoint are 127.0.0.1:6170
trainer_endpoints

The endpoints of all trainer nodes in the task, which are used to broadcast the NCCL ID when NCCL2 is initialized.

Its value is equal to the value of the environment variable PADDLE_TRAINER_ENDPOINTS. The default value is “”.

Examples

# execute this command in terminal: export PADDLE_TRAINER_ENDPOINTS=127.0.0.1:6170,127.0.0.1:6171
import paddle.fluid as fluid

env = fluid.dygraph.ParallelEnv()
print("The trainer endpoints are %s" % env.trainer_endpoints)
# The trainer endpoints are ['127.0.0.1:6170', '127.0.0.1:6171']