Appendix

Nvidia GPU architecture and installation mode supported by PaddlePaddle

GPU Compute Capability Corresponding GPU hardware model Please download the following CUDA version of PaddlePaddle installation package
Pascal sm_60 Quadro GP100, Tesla P100, DGX-1 CUDA10、CUDA11
Pascal sm_61 GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla P40, Tesla P4 CUDA10、CUDA11
Volta sm_70 DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100 CUDA10、CUDA11
Turing sm_75 GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4 CUDA10、CUDA11
Ampere sm_80 NVIDIA A100, GA100, NVIDIA DGX-A100 CUDA11
Ampere sm_86 Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU CUDA11、CUDA11.2 (Recommend)
Hopper sm_90 NVIDIA H100, H800 CUDA12



Compile Dependency Table

Dependency package name Version Description Installation command
CMake 3.18, 3.19(Recommend)
GCC 8.2 / 12.2 recommends using devtools2 for CentOS
Clang (MacOS Only) 9.0 and above Usually use the clang version of MacOS 10.11 and above
Python(64 bit) 3.8+.x depends on libpython3.8+.so please go to Python official website
SWIG at least 2.0 apt install swig or yum install swig
wget any apt install wget or yum install wget
openblas any optional
pip at least 20.2.2 apt install python-pip or yum install Python-pip
numpy >=1.13.0 pip install numpy
protobuf >=3.20.2 pip install protobuf
wheel any pip install wheel
patchELF any apt install patchelf or read github patchELF official documentation
go >=1.8 optional
setuptools >= 50.3.2
unrar brew install unrar (For MacOS), apt-get install unrar (For Ubuntu)



Compile Option Table

BLAS

PaddlePaddle supports two BLAS libraries, MKL and OpenBlAS. MKL is used by default. If you use MKL and the machine contains the AVX2 instruction set, you will also download the MKL-DNN math library, for details please refer to here.

If you close MKL, OpenBLAS will be used as the BLAS library.

CUDA/cuDNN

PaddlePaddle automatically finds the CUDA and cuDNN libraries installed in the system for compilation and execution at compile time/runtime. Use the parameter -DCUDA_ARCH_NAME=Auto to specify to enable automatic detection of the SM architecture and speed up compilation.

PaddlePaddle can be compiled and run using any version after cuDNN v5.1, but try to keep the same version of cuDNN in the compiling and running processes. We recommend using the latest version of cuDNN.

Configure Compile Options

PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specifying paths at compile time. When cmake compiles, it first searches the system paths ( /usr/liby and /usr/local/lib ) for these libraries, and also reads the relevant path variables for searching. Can be set by using the -D command, for example:

Cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5

Note: The settings introduced here for these compilation options are only valid for the first cmake. If you want to reset it later, it is recommended to clean up the entire build directory ( rm -rf ) and then specify it.



Installation Package List

Option Description Default
WITH_GPU Whether to support GPU ON
WITH_AVX whether to compile PaddlePaddle binaries file containing the AVX instruction set ON
WITH_PYTHON Whether the PYTHON interpreter is embedded ON
WITH_TESTING Whether to turn on unit test OFF
WITH_MKL Whether to use the MKL math library, if not,using OpenBLAS ON
WITH_SYSTEM_BLAS Whether to use the system's BLAS OFF
WITH_DISTRIBUTE Whether to Compile with distributed version OFF
WITH_BRPC_RDMA Whether to use BRPC RDMA as RPC protocol OFF
ON_INFER Whether to turn on prediction optimization OFF
CUDA_ARCH_NAME Compile only for current CUDA schema or not All:Compile all supported CUDA architectures optional: Auto automatically recognizes the schema compilation of the current environment
TENSORRT_ROOT Specify TensorRT path The default value under windows is '/', The default value under windows is '/usr/'
Version Number Release Discription
paddlepaddle==[version code] such as paddlepaddle==2.6.1 Only support the corresponding version of the CPU PaddlePaddle, please refer to Pypi for the specific version.
paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==2.6.1 The default installation supports the PaddlePaddle installation package corresponding to [version number] of CUDA 11.2 and cuDNN 8

You can find various distributions of PaddlePaddle-gpu in the Release History.

‘postxx’ corresponds to CUDA and cuDNN versions, and the number before ‘postxx’ represents the version of Paddle

Please note that: in the commands, paddlepaddle-gpu==2.6.1 will install the installation package of PaddlePaddle that supports CUDA 11.2 and cuDNN 8 by default under Windows environment.



Multi-version whl package list - Release

Release Instruction cp38-cp38 cp39-cp39 cp310-cp310 cp311-cp311 cp312-cp312
cpu-mkl-avx paddlepaddle-2.6.1-cp38-cp38-linux_x86_64.whl paddlepaddle-2.6.1-cp39-cp39-linux_x86_64.whl paddlepaddle-2.6.1-cp310-cp310-linux_x86_64.whl paddlepaddle-2.6.1-cp311-cp311-linux_x86_64.whl paddlepaddle-2.6.1-cp312-cp312-linux_x86_64.whl
cpu-openblas-avx paddlepaddle-2.6.1-cp38-cp38-linux_x86_64.whl - - - -
cuda11.2-cudnn8.1-mkl-gcc8.2-avx paddlepaddle_gpu-2.6.1.post112-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp312-cp312-linux_x86_64.whl
cuda11.6-cudnn8.4-mkl-gcc8.2-avx paddlepaddle_gpu-2.6.1.post116-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post116-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post116-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post116-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post116-cp312-cp312-linux_x86_64.whl
cuda11.7-cudnn8.4-mkl-gcc8.2-avx paddlepaddle_gpu-2.6.1.post117-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post117-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post117-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post117-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post117-cp312-cp312-linux_x86_64.whl
cuda11.8-cudnn8.6-mkl-gcc8.2-avx paddlepaddle_gpu-2.6.1-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp312-cp312-linux_x86_64.whl
cuda12.0-cudnn8.9-mkl-gcc12.2-avx paddlepaddle_gpu-2.6.1.post120-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post120-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post120-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post120-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post120-cp312-cp312-linux_x86_64.whl
macos-cpu-openblas paddlepaddle-2.6.1-cp38-cp38-macosx_10_14_x86_64.whl paddlepaddle-2.6.1-cp39-cp39-macosx_10_14_x86_64.whl paddlepaddle-2.6.1-cp310-cp310-macosx_10_14_universal2.whl paddlepaddle-2.6.1-cp311-cp311-macosx_10_14_universal2.whl paddlepaddle-2.6.1-cp312-cp312-macosx_10_14_universal2.whl
macos-cpu-openblas-m1 paddlepaddle-2.6.1-cp38-cp38-macosx_11_0_arm64.whl paddlepaddle-2.6.1-cp39-cp39-macosx_11_0_arm64.whl paddlepaddle-2.6.1-cp310-cp310-macosx_11_0_arm64.whl paddlepaddle-2.6.1-cp311-cp311-macosx_11_0_arm64.whl paddlepaddle-2.6.1-cp312-cp312-macosx_11_0_arm64.whl
win-cpu-mkl-avx paddlepaddle-2.6.1-cp38-cp38-win_amd64.whl paddlepaddle-2.6.1-cp39-cp39-win_amd64.whl paddlepaddle-2.6.1-cp310-cp310-win_amd64.whl paddlepaddle-2.6.1-cp311-cp311-win_amd64.whl paddlepaddle-2.6.1-cp312-cp312-win_amd64.whl
win-cpu-openblas-avx paddlepaddle-2.6.1-cp38-cp38-win_amd64.whl - - - -
win-cuda11.2-cudnn8.2-mkl-vs2019-avx paddlepaddle_gpu-2.6.1.post112-cp38-cp38-win_amd64.whl paddlepaddle_gpu-2.6.1.post112-cp39-cp39-win_amd64.whl paddlepaddle_gpu-2.6.1.post112-cp310-cp310-win_amd64.whl paddlepaddle_gpu-2.6.1.post112-cp311-cp311-win_amd64.whl paddlepaddle_gpu-2.6.1.post112-cp312-cp312-win_amd64.whl
win-cuda11.6-cudnn8.4-mkl-vs2019-avx paddlepaddle_gpu-2.6.1.post116-cp38-cp38-win_amd64.whl paddlepaddle_gpu-2.6.1.post116-cp39-cp39-win_amd64.whl paddlepaddle_gpu-2.6.1.post116-cp310-cp310-win_amd64.whl paddlepaddle_gpu-2.6.1.post116-cp311-cp311-win_amd64.whl paddlepaddle_gpu-2.6.1.post116-cp312-cp312-win_amd64.whl
win-cuda11.7-cudnn8.4-mkl-vs2019-avx paddlepaddle_gpu-2.6.1.post117-cp38-cp38-win_amd64.whl paddlepaddle_gpu-2.6.1.post117-cp39-cp39-win_amd64.whl paddlepaddle_gpu-2.6.1.post117-cp310-cp310-win_amd64.whl paddlepaddle_gpu-2.6.1.post117-cp311-cp311-win_amd64.whl paddlepaddle_gpu-2.6.1.post117-cp312-cp312-win_amd64.whl
win-cuda11.8-cudnn8.6-mkl-vs2019-avx paddlepaddle_gpu-2.6.1-cp38-cp38-win_amd64.whl paddlepaddle_gpu-2.6.1-cp39-cp39-win_amd64.whl paddlepaddle_gpu-2.6.1-cp310-cp310-win_amd64.whl paddlepaddle_gpu-2.6.1-cp311-cp311-win_amd64.whl paddlepaddle_gpu-2.6.1-cp312-cp312-win_amd64.whl
win-cuda12.0-cudnn8.9-mkl-vs2019-avx paddlepaddle_gpu-2.6.1.post120-cp38-cp38-win_amd64.whl paddlepaddle_gpu-2.6.1.post120-cp39-cp39-win_amd64.whl paddlepaddle_gpu-2.6.1.post120-cp310-cp310-win_amd64.whl paddlepaddle_gpu-2.6.1.post120-cp311-cp311-win_amd64.whl paddlepaddle_gpu-2.6.1.post120-cp312-cp312-win_amd64.whl
linux-cinn-cuda11.2-cudnn8-mkl-gcc8.2-avx paddlepaddle_gpu-2.6.1.post112-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1.post112-cp312-cp312-linux_x86_64.whl
linux-cuda11.2-cudnn8-mkl-gcc8.2-avx-pascal paddlepaddle_gpu-2.6.1-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-2.6.1-cp312-cp312-linux_x86_64.whl

Table instruction

  • Vertical axis

cpu-mkl: Support CPU training and prediction, use Intel MKL math library

cuda10_cudnn7-mkl: Support GPU training and prediction, use Intel MKL math library

  • Transverse axis

Generally, it is similar to “cp310-cp310”, in which:

310:python tag, refers to python3.10. Similarly, there are “38”, “39”, “310”, “311”, “312”, etc

mu:refers to unicode version python, if it is m, refers to non Unicode version Python

  • Installation package naming rules

Each installation package has a unique name. They are named according to the official rules of Python. The form is as follows:

{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl

The build tag can be missing, and other parts cannot be missing

distribution: wheel name

version: Version, for example 0.14.0 (must be in numeric format)

python tag: similar to ‘py38’, ‘py39’, ‘py310’, ‘py311’, ‘py312’, used to indicate the corresponding Python version

abi tag: similar to ‘cp33m’, ‘abi3’, ‘none’

platform tag: similar to ‘linux_x86_64’, ‘any’



Multi-version whl package list - dev

Release Instruction cp38-cp38 cp39-cp39 cp310-cp310 cp311-cp311 cp312-cp312
linux-cpu-mkl-avx paddlepaddle-latest-cp38-cp38-linux_x86_64.whl paddlepaddle-latest-cp39-cp39-linux_x86_64.whl paddlepaddle-latest-cp310-cp310-linux_x86_64.whl paddlepaddle-latest-cp311-cp311-linux_x86_64.whl paddlepaddle-latest-cp312-cp312-linux_x86_64.whl
linux-cpu-openblas-avx paddlepaddle-latest-cp38-cp38-linux_x86_64.whl - - - -
cuda11.2-cudnn8.1-mkl paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-latest-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-latest-cp312-cp312-linux_x86_64.whl
cuda11.6-cudnn8.4-mkl paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-latest-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-latest-cp312-cp312-linux_x86_64.whl
cuda11.7-cudnn8.4-mkl paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-latest-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-latest-cp312-cp312-linux_x86_64.whl
cuda11.8-cudnn8.6-mkl paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-latest-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-latest-cp312-cp312-linux_x86_64.whl
cuda12.0-cudnn8.9-mkl paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-latest-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-latest-cp312-cp312-linux_x86_64.whl
mac-cpu paddlepaddle-cp38-cp38-macosx_10_9_x86_64.whl paddlepaddle-cp39-cp39-macosx_10_9_x86_64.whl paddlepaddle-cp310-cp310-macosx_10_9_x86_64.whl paddlepaddle-cp311-cp311-macosx_10_9_x86_64.whl paddlepaddle-cp312-cp312-macosx_10_9_x86_64.whl
macos-cpu-openblas-m1 paddlepaddle-cp38-cp38-macosx_11_0_arm64.whl paddlepaddle-cp39-cp39-macosx_11_0_arm64.whl paddlepaddle-cp310-cp310-macosx_11_0_arm64.whl paddlepaddle-cp311-cp311-macosx_11_0_arm64.whl paddlepaddle-cp312-cp312-macosx_11_0_arm64.whl
win-cpu-mkl-avx paddlepaddle-latest-cp38-cp38-win_amd64.whl paddlepaddle-latest-cp39-cp39-win_amd64.whl paddlepaddle-latest-cp310-cp310-win_amd64.whl paddlepaddle-latest-cp311-cp311-win_amd64.whl paddlepaddle-latest-cp312-cp312-win_amd64.whl
win-cpu-openblas-avx paddlepaddle-latest-cp38-cp38-win_amd64.whl - - - -
win-cuda11.2-cudnn8.2-mkl-vs2019-avx paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl paddlepaddle_gpu-latest-cp310-cp310-win_amd64.whl paddlepaddle_gpu-latest-cp311-cp311-win_amd64.whl paddlepaddle_gpu-latest-cp312-cp312-win_amd64.whl
win-cuda11.6-cudnn8.4.0-mkl-avx-vs2019 paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl paddlepaddle_gpu-latest-cp310-cp310-win_amd64.whl paddlepaddle_gpu-latest-cp311-cp311-win_amd64.whl paddlepaddle_gpu-latest-cp312-cp312-win_amd64.whl
win-cuda11.7-cudnn8.4.1-mkl-avx-vs2019 paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl paddlepaddle_gpu-latest-cp310-cp310-win_amd64.whl paddlepaddle_gpu-latest-cp311-cp311-win_amd64.whl paddlepaddle_gpu-latest-cp312-cp312-win_amd64.whl
win-cuda11.8-cudnn8.6.0-mkl-avx-vs2019 paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl paddlepaddle_gpu-latest-cp310-cp310-win_amd64.whl paddlepaddle_gpu-latest-cp311-cp311-win_amd64.whl paddlepaddle_gpu-latest-cp312-cp312-win_amd64.whl
win-cuda12.0-cudnn8.9.1-mkl-avx-vs2019 paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl paddlepaddle_gpu-latest-cp310-cp310-win_amd64.whl paddlepaddle_gpu-latest-cp311-cp311-win_amd64.whl paddlepaddle_gpu-latest-cp312-cp312-win_amd64.whl



Execute the PaddlePaddle training program in Docker

Suppose you have written a PaddlePaddle program in the current directory (such as /home/work): train.py ( refer to PaddlePaddleBook to write), you can start the training with the following command:

cd /home/work
docker run -it -v $PWD:/work registry.baidubce.com/paddlepaddle/paddle /work/train.py

In the above commands, the -it parameter indicates that the container has been run interactively; -v $PWD:/work specifies that the current path (the absolute path where the PWD variable in Linux will expand to the current path) is mounted to the :/work directory inside the container: registry.baidubce.com/paddlepaddle/paddle specifies the container to be used; finally /work/train.py is the command executed inside the container, ie. the training program.

Of course, you can also enter into the Docker container and execute or debug your code interactively:

docker run -it -v $PWD:/work registry.baidubce.com/paddlepaddle/paddle /bin/bash
cd /work
python train.py

**Note: In order to reduce the size, vim is not installed in PaddlePaddle Docker image by default. You can edit the code in the container after executing ** apt-get install -y vim (which installs vim for you) in the container.



Start PaddlePaddle Book tutorial with Docker

Use Docker to quickly launch a local Jupyter Notebook containing the PaddlePaddle official Book tutorial, which can be viewed on the web. PaddlePaddle Book is an interactive Jupyter Notebook for users and developers. If you want to learn more about deep learning, PaddlePaddle Book is definitely your best choice. You can read tutorials or create and share interactive documents with code, formulas, charts, and text.

We provide a Docker image that can run the PaddlePaddle Book directly, running directly:

docker run -p 8888:8888 registry.baidubce.com/paddlepaddle/book

Domestic users can use the following image source to speed up access:

docker run -p 8888:8888 registry.baidubce.com/paddlepaddle/book

Then enter the following URL in your browser:

http://localhost:8888/



Perform GPU training using Docker

In order to ensure that the GPU driver works properly in the image, we recommend using nvidia-docker to run the image. Don’t forget to install the latest GPU drivers on your physical machine in advance.

Nvidia-docker run -it -v $PWD:/work registry.baidubce.com/paddlepaddle/paddle:latest-gpu /bin/bash

Note: If you don’t have nvidia-docker installed, you can try the following to mount the CUDA library and Linux devices into the Docker container:

export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') \
$(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} \
${DEVICES} -it registry.baidubce.com/paddlepaddle/paddle:latest-gpu