Add Kernels for a New Device


PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.

This document explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type OpKernelType. An operator chooses the right kernel at runtime. This choosing mechanism is described here.

Write Kernels for A New Device

Add A New Device

For some historical reaons, we misuse the word library for device. For example, we call the deivce type by library type. An example is the header file library_type.h. We will correct this ASAP.

To register a new device, we need to add an enum value to LibraryType:

enum class LibraryType {
  kPlain = 0,
  kMKLDNN = 1,
  kCUDNN = 2,

Add A New Place

If you have a new kind of Device, firstly you need to add a new kind of Place. For example CUDAPlace:

struct CUDAPlace {
  CUDAPlace() : CUDAPlace(0) {}
  explicit CUDAPlace(int d) : device(d) {}

  inline int GetDeviceId() const { return device; }
  // needed for variant equality comparison
  inline bool operator==(const CUDAPlace &o) const {
    return device == o.device;
  inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }

  int device;

typedef boost::variant<CUDAPlace, CPUPlace> Place;

Add device context

After a new kind of Device is added, you should add a corresponding DeviceContext for it.

class DeviceContext {
  virtual ~DeviceContext() {}
  virtual Place GetPlace() const = 0;

  virtual void Wait() const {}

Implement new OpKernel for your Device.

A detailed documentation can be found in new_op_and_kernel

class OpKernelBase {
   * ExecutionContext is the only parameter of Kernel Run function.
   * Run will get input/output variables, state such as momentum and
   * device resource such as CUDA stream, cublas handle, etc. from
   * ExecutionContext. User should construct it before run the Operator.

  virtual void Compute(const ExecutionContext& context) const = 0;

  virtual ~OpKernelBase() = default;

template <typename T>
class OpKernel : public OpKernelBase {
  using ELEMENT_TYPE = T;

Register the OpKernel to framework

After writing the components described above, we should register the kernel to the framework.

We use REGISTER_OP_KERNEL to do the registration.

    kernel0, kernel1, ...)

kernel0, kernel1 are kernels that have the same op_type, library_type, place_type but different data_types.

take conv2d as an example:

REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);

REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,

In the code above:

  • conv2d is the type/name of the operator
  • CUDNN/CPU is library
  • paddle::platform::CUDAPlace/CPUPlace is place
  • template parameter float/double on CUDNNConvOpKernel<T> is data_type.