class paddle.fluid.transpiler.DistributeTranspiler


Convert the fluid program to distributed data-parallelism programs.

The main_program will be transformed to use a remote parameter server to do parameter optimization. And the optimization graph will be put into a parameter server program.


# Define your model before these codes.
port = os.getenv("PADDLE_PSERVER_PORT", "6174")
pserver_ips = os.getenv("PADDLE_PSERVER_IPS", "")
eplist = []
for ip in pserver_ips.split(","):
     eplist.append(':'.join([ip, port]))
pserver_endpoints = ",".join(eplist)
trainers = int(os.getenv("PADDLE_TRAINERS"))
current_endpoint = os.getenv("PADDLE_CURRENT_IP", "") + ":" + port
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
role = os.getenv("PADDLE_TRAINING_ROLE")

t = distribute_transpiler.DistributeTranspiler()
     trainer_id, pservers=pserver_endpoints, trainers=trainers)
if role == "PSERVER":
     pserver_program = t.get_pserver_program(current_endpoint)
     pserver_startup_program = t.get_startup_program(current_endpoint,
elif role == "TRAINER":
     trainer_program = t.get_trainer_program()
transpile(trainer_id, program=None, pservers='', trainers=1, slice_var_up=True, split_method=<class 'paddle.fluid.transpiler.ps_dispatcher.RoundRobin'>, sync_mode=True)

Run the transpiler.

  • trainer_id (int) – id for current trainer worker, if you have n workers, the id may range from 0 ~ n-1
  • program (Program|None) – program to transpile, default is fluid.default_main_program().
  • pservers (str) – comma separated ip:port string for the pserver list.
  • trainers (int) – number of trainers in the distributed job.
  • slice_var_up (bool) – Do Tensor slice for pservers, default is True.
  • split_method (PSDispatcher) – RoundRobin or HashName can be used try to choose the best method to balance loads for pservers.
  • sync_mode (bool) – Do sync training or not, default is True.

Get transpiled trainer side program.

Returns:trainer side program.
Return type:Program

Get parameter server side program.

Parameters:endpoint (str) – current parameter server endpoint.
Returns:the program for current parameter server to run.
Return type:Program
get_startup_program(endpoint, pserver_program)

Get startup program for current parameter server. Modify operator input variables if there are variables that were split to several blocks.

  • endpoint (str) – current pserver endpoint.
  • pserver_program (Program) – call get_pserver_program first and pass the result here.

parameter server side startup program.

Return type:



class paddle.fluid.transpiler.InferenceTranspiler

Convert the fluid program to optimized inference program.

There are several optimizations, only fuse batch normalization is supported now.


# As InferenceTranspiler will modify the original program,
# please clone before use it.
inference_transpiler_program = program.clone()
t = fluid.InferenceTranspiler()
t.transpile(inference_transpiler_program, place)
transpile(program, place, scope=None)

Run the transpiler.

  • program (Program) – program to transpile
  • place (Place) – inference place
  • scope (Scope|None) – inference Scope
fuse_batch_norm(program, place, scope)

Transpile the program by fused batch normalization.

The batch normalization followed the convolution or fully connected layer can be integrated with them. Doing so will give us a forward acceleration, especially in environments like mobile or embedded.

For input \(X\):

  • Conv process: \(X = input * W + bias\)
  • Batch norm process: \(X' = (X - mean) / std\)
  • Scale Process: \(Y = a * X' + b\)

After fuse into one operation:

\[\begin{split}Y &= (input * W + bias - mean) / std * a + b \\ &= input * a * W / std + ((bias - mean) / std * a + b)\end{split}\]

The operator transformation is:

  • before:
    • conv->batch_norm->any_other_op (bias == 0)
    • conv->elementwise_add->batch_norm->any_other_op (bias != 0)
  • after:
    • conv->elementwise_add->any_other_op

The transpile stages are:

  1. insert elementwise_add op when bias == 0.
  2. fuse the batch_norm’s parameters to conv and elementwise_add operators.
  3. remove batch_norm ops which are not used in any other ops.
  4. adjust the input of any_other_op to be the output of elementwise_add operator.
  5. remove unused variables.
  • program (Program) – program to transpile
  • place (Place) – inference place
  • scope (Scope) – inference Scope


paddle.fluid.transpiler.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0)

Optimize memory by reusing var memory.

Note: it doesn’t not support subblock nested in subblock.
  • input_program – Input Program
  • print_log – whether to print debug log.
  • level – If level=0, reuse if the shape is completely equal, o


paddle.fluid.transpiler.release_memory(input_program, skip_opt_set=None)

Modify the input program and insert delete_op to early drop not used variables. The modification will be performed inplace.

Notes: This is an experimental API and could be removed in next few releases. Users should not use this API.

Parameters:input_program (Program) – The program will be inserted delete_op.


class paddle.fluid.transpiler.HashName(pserver_endpoints)

Hash variable names to several endpoints using python “hash()” function.

Parameters:pserver_endpoints (list) – list of endpoint(ip:port).


class paddle.fluid.transpiler.RoundRobin(pserver_endpoints)

Distribute variables to serveral endpoints using RondRobin<https://en.wikipedia.org/wiki/Round-robin_scheduling> method.

Parameters:pserver_endpoints (list) – list of endpoint(ip:port).