Install and Compile C++ Inference Library on Linux

Direct Download and Installation

Table 1 c++ inference library list

version description

inference library(1.8.5 version)

inference library(2.0.2 version)

inference library(develop version)

ubuntu14.04_cpu_avx_mkl_gcc82

fluid_inference.tgz

paddle_inference.tgz

paddle_inference.tgz

ubuntu14.04_cpu_avx_openblas_gcc82

fluid_inference.tgz

paddle_inference.tgz

paddle_inference.tgz

ubuntu14.04_cpu_noavx_openblas_gcc82

fluid_inference.tgz

paddle_inference.tgz

paddle_inference.tgz

ubuntu14.04_cuda9.0_cudnn7_avx_mkl_gcc482

fluid_inference.tgz

paddle_inference.tgz

paddle_inference.tgz

ubuntu14.04_cuda10.0_cudnn7_avx_mkl_gcc482

fluid_inference.tgz

paddle_inference.tgz

paddle_inference.tgz

ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6_gcc82

paddle_inference.tgz

ubuntu14.04_cuda10.2_cudnn8_avx_mkl_trt7_gcc82

paddle_inference.tgz

ubuntu14.04_cuda11_cudnn8_avx_mkl_trt7_gcc82

paddle_inference.tgz

nv_jetson_cuda10_cudnn7.6_trt6_all(jetpack4.3)

paddle_inference.tar.gz

nv_jetson_cuda10.2_cudnn8_trt7_all(jetpack4.4/4.5)

paddle_inference.tar.gz

Build from Source Code

Users can also compile C++ inference libraries from the PaddlePaddle core code by specifying the following compile options at compile time:

Option

Value

Description

CMAKE_BUILD_TYPE

Release

cmake build type, set to Release if debug messages are not needed

FLUID_INFERENCE_INSTALL_DIR

path

install path of inference libs

WITH_PYTHON

OFF(recomended)

build python libs and whl package

ON_INFER

ON(recomended)

build with inference settings

WITH_GPU

ON/OFF

build inference libs on GPU

WITH_MKL

ON/OFF

build inference libs supporting MKL

WITH_MKLDNN

ON/OFF

build inference libs supporting MKLDNN

WITH_XBYAK

ON

build with XBYAK, must be OFF when building on NV Jetson platforms

WITH_NV_JETSON

OFF

build inference libs on NV Jetson platforms

WITH_TENSORRT

OFF

build inference libs with NVIDIA TensorRT. TENSORRT_ROOT needs to be set as the root dir of TRT at the same time

It is recommended to configure options according to the recommended values to avoid linking unnecessary libraries. Other options can be set if it is necessary.

Firstly we pull the latest code from github.

git clone https://github.com/paddlepaddle/Paddle
cd Paddle
# Use git checkout to switch to stable versions such as release/2.0
git checkout release/2.0

note: If your environment is a multi-card machine, it is recommended to install nccl; otherwise, you can skip this step by specifying WITH_NCCL = OFF during compilation. Note that if WITH_NCCL = ON, and NCCL is not installed, the compiler will report an error.

git clone https://github.com/NVIDIA/nccl.git
cd nccl
make -j4
make install

build inference libs on server

Following codes set the configurations and execute building(PADDLE_ROOT should be set to the actual installing path of inference libs, WITH_NCCL should be modified according to the actual environment.).

PADDLE_ROOT=/path/of/capi
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
mkdir build
cd build
cmake -DFLUID_INFERENCE_INSTALL_DIR=$PADDLE_ROOT \
      -DCMAKE_BUILD_TYPE=Release \
      -DWITH_PYTHON=OFF \
      -DWITH_MKL=OFF \
      -DWITH_GPU=OFF  \
      -DON_INFER=ON \
      -DWITH_NCCL=OFF \
      ..
 make
 make inference_lib_dist

build inference libs on NVIDIA Jetson platforms

NVIDIA Jetson is an AI computing platform in embedded systems introduced by NVIDIA. Paddle Inference supports building inference libs on NVIDIA Jetson platforms. The steps are as following.

  1. Prepare environments

Turn on hardware performance mode

sudo nvpmodel -m 0 && sudo jetson_clocks

if building on Nano hardwares, increase swap memory

# Increase DDR valid space. Default memory allocated is 16G, which is enough for Xavier. Following steps are for Nano hardwares.
sudo fallocate -l 5G /var/swapfile
sudo chmod 600 /var/swapfile
sudo mkswap /var/swapfile
sudo swapon /var/swapfile
sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
  1. Build paddle inference libs

cd Paddle
mkdir build
cd build
cmake .. \
  -DWITH_CONTRIB=OFF \
  -DWITH_MKL=OFF  \
  -DWITH_MKLDNN=OFF \
  -DWITH_TESTING=OFF \
  -DCMAKE_BUILD_TYPE=Release \
  -DON_INFER=ON \
  -DWITH_PYTHON=OFF \
  -DWITH_XBYAK=OFF  \
  -DWITH_NV_JETSON=ON
make -j4
# Generate inference libs
make inference_lib_dist -j4
  1. Test with samples

System Message: WARNING/2 (/FluidDoc/doc/paddle/guides/05_inference_deployment/inference/build_and_install_lib_en.rst, line 134)

Enumerated list ends without a blank line; unexpected unindent.

FAQ

  1. Error:

ERROR: ../aarch64-linux-gpn/crtn.o: Too many open files.

Fix this by increasing the number of files the system can open at the same time to 2048.

ulimit -n 2048
  1. The building process hangs.

System Message: WARNING/2 (/FluidDoc/doc/paddle/guides/05_inference_deployment/inference/build_and_install_lib_en.rst, line 151)

Enumerated list ends without a blank line; unexpected unindent.

Might be downloading third-party libs. Wait or kill the building process and start again.

  1. Lacking virtual destructors for IPluginFactory or IGpuAllocator when using TensorRT.

System Message: WARNING/2 (/FluidDoc/doc/paddle/guides/05_inference_deployment/inference/build_and_install_lib_en.rst, line 154)

Enumerated list ends without a blank line; unexpected unindent.

After downloading and installing TensorRT, add virtual destructors for IPluginFactory and IGpuAllocator in NvInfer.h:

virtual ~IPluginFactory() {};
virtual ~IGpuAllocator() {};

After successful compilation, dependencies required by the C++ inference library Will be stored in the PADDLE_ROOT directory. (dependencies including: (1) compiled PaddlePaddle inference library and header files; (2) third-party link libraries and header files; (3) version information and compilation option information)

The directory structure is:

PaddleRoot/
├── CMakeCache.txt
├── paddle
│   ├── include
│   │   ├── paddle_anakin_config.h
│   │   ├── paddle_analysis_config.h
│   │   ├── paddle_api.h
│   │   ├── paddle_inference_api.h
│   │   ├── paddle_mkldnn_quantizer_config.h
│   │   └── paddle_pass_builder.h
│   └── lib
│       ├── libpaddle_fluid.a
│       └── libpaddle_fluid.so
├── third_party
│   ├── boost
│   │   └── boost
│   ├── eigen3
│   │   ├── Eigen
│   │   └── unsupported
│   └── install
│       ├── gflags
│       ├── glog
│       ├── mkldnn
│       ├── mklml
│       ├── protobuf
│       ├── snappy
│       ├── snappystream
│       ├── xxhash
│       └── zlib
└── version.txt

The version information of the inference library is recorded in version.txt, including Git Commit ID, version of OpenBlas, MKL math library, or CUDA/CUDNN. For example:

GIT COMMIT ID: cc9028b90ef50a825a722c55e5fda4b7cd26b0d6
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 8.0
CUDNN version: v7