profiler

paddle.fluid.profiler.profiler(state, sorted_key=None, profile_path='/tmp/profile')[source]

The profiler interface. Different from fluid.profiler.cuda_profiler, this profiler can be used to profile both CPU and GPU program.

Parameters
  • state (str) – The profiling state, which should be one of ‘CPU’, ‘GPU’ or ‘All’. ‘CPU’ means only profiling CPU; ‘GPU’ means profiling both CPU and GPU; ‘All’ means profiling both CPU and GPU, and generates timeline as well.

  • sorted_key (str, optional) – The order of profiling results, which should be one of None, ‘calls’, ‘total’, ‘max’, ‘min’ or ‘ave’. Default is None, means the profiling results will be printed in the order of first end time of events. The calls means sorting by the number of calls. The total means sorting by the total execution time. The max means sorting by the maximum execution time. The min means sorting by the minimum execution time. The ave means sorting by the average execution time.

  • profile_path (str, optional) – If state == ‘All’, it will generate timeline, and write it into profile_path. The default profile_path is /tmp/profile.

Raises

ValueError – If state is not in [‘CPU’, ‘GPU’, ‘All’]. If sorted_key is not in [‘calls’, ‘total’, ‘max’, ‘min’, ‘ave’].

Examples

import paddle.fluid as fluid
import paddle.fluid.profiler as profiler
import numpy as np

epoc = 8
dshape = [4, 3, 28, 28]
data = fluid.data(name='data', shape=[None, 3, 28, 28], dtype='float32')
conv = fluid.layers.conv2d(data, 20, 3, stride=[1, 1], padding=[1, 1])

place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

with profiler.profiler('CPU', 'total', '/tmp/profile') as prof:
    for i in range(epoc):
        input = np.random.random(dshape).astype('float32')
        exe.run(fluid.default_main_program(), feed={'data': input})

Examples Results:

#### Examples Results ####
#### 1) sorted_key = 'total', 'calls', 'max', 'min', 'ave' ####
# The only difference in 5 sorted_key results is the following sentense:
# "Sorted by number of xxx in descending order in the same thread."
# The reason is that in this example, above 5 columns are already sorted.
------------------------->     Profiling Report     <-------------------------

Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
#Sorted by number of calls in descending order in the same thread
#Sorted by number of max in descending order in the same thread
#Sorted by number of min in descending order in the same thread
#Sorted by number of avg in descending order in the same thread

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d             8           129.406     0.304303    127.076     16.1758     0.983319
thread0::elementwise_add    8           2.11865     0.193486    0.525592    0.264832    0.016099
thread0::feed               8           0.076649    0.006834    0.024616    0.00958112  0.000582432

#### 2) sorted_key = None  ####
# Since the profiling results are printed in the order of first end time of Ops,
# the printed order is feed->conv2d->elementwise_add
------------------------->     Profiling Report     <-------------------------

Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::feed               8           0.077419    0.006608    0.023349    0.00967738  0.00775934
thread0::conv2d             8           7.93456     0.291385    5.63342     0.99182     0.795243
thread0::elementwise_add    8           1.96555     0.191884    0.518004    0.245693    0.196998