英伟达 TensorRT 部署示例¶
Paddle Lite 已支持 NVIDIA TensorRT 预测部署。 其接入原理是在线分析 Paddle 模型,将 Paddle 算子先转为统一的 NNAdapter 标准算子,再通过 TensorRT 组网 API 进行网络构建,在线生成并执行模型。
支持现状¶
已支持的 GPU 类型¶
Jetson 全系列
Pascal/Volt/Turning 架构的 GPU, 即将支持 Ampere 架构 GPU
已支持的英伟达软件栈¶
Jetson
Jetpack 4.3 以上
Tesla
CUDA 10.2/CUDA 11.0/CUDA 11.1
cuDNN
8.0.x
TensorRT
7.1.3.x
已支持模型¶
性能¶
测试环境
设备环境
NVIDIA Jetson AGX Xavier [16GB]
Jetpack 4.4.1 [L4T 32.4.4]
NV Power Mode: MAXN - Type: 0
Board info:
Type: AGX Xavier [16GB]
CUDA GPU architecture (ARCH_BIN): 7.2
Libraries:
CUDA: 10.2.89
cuDNN: 8.0.0.180
TensorRT: 7.1.3.0
Visionworks: 1.6.0.501
OpenCV: 4.1.1 compiled CUDA: NO
VPI: 0.4.4
Vulkan: 1.2.70
编译环境
操作系统: Ubuntu 18.04.4 LTS aarch64
gcc: 7.5.0
cmake: 3.23.0-rc4
测试结果
Model | Input | Batch | Dataset | GPU FP16 Latency(ms) | GPU INT8 Latency(ms) | DLA FP16 Latency(ms) | DLA INT8 Latency(ms) |
---|---|---|---|---|---|---|---|
resnet50_fp32_224 | 1,3,224,224 | 1 | ImageNet 2012 | 3.15104 | 2.26772 | 6.65585 | 3.95726 |
yolov3_darknet53_270e_coco_fp32_608 | 1,3,608,608 | 1 | COCO | 37.327 | 24.584800 | 54.19 | 31.959 |
已支持(或部分支持)的 Paddle 算子¶
您可以查阅 NNAdapter 算子支持列表获得各算子在不同新硬件上的最新支持信息。
准备设备环境¶
如需安装 TensorRT 环境, 请参考 NVIDIA TENSORRT DOCUMENTATION
运行示例程序¶
下载示例程序 PaddleLite-generic-demo.tar.gz ,解压后清单如下:
- PaddleLite-generic-demo - image_classification_demo - assets - images - tabby_cat.jpg # 测试图片 - tabby_cat.raw # 经过 convert_to_raw_image.py 处理后的 RGB Raw 图像 - labels - synset_words.txt # 1000 分类 label 文件 - models - resnet50_fp32_224 # Paddle non-combined 格式的 resnet50 float32 模型 - __model__ # Paddle fluid 模型组网文件,可拖入 https://lutzroeder.github.io/netron/ 进行可视化显示网络结构 - bn2a_branch1_mean # Paddle fluid 模型参数文件 - bn2a_branch1_scale ... - shell - CMakeLists.txt # 示例程序 CMake 脚本 - build.linux.amd64 # 已编译好的,适用于 amd64 - image_classification_demo # 已编译好的,适用于 amd64 的示例程序 - build.linux.arm64 # 已编译好的,适用于 arm64 - image_classification_demo # 已编译好的,适用于 arm64 的示例程序 ... ... - image_classification_demo.cc # 示例程序源码 - build.sh # 示例程序编译脚本 - run.sh # 示例程序本地运行脚本 - run_with_ssh.sh # 示例程序 ssh 运行脚本 - run_with_adb.sh # 示例程序 adb 运行脚本 - libs - PaddleLite - android - arm64-v8a - armeabi-v7a - linux - amd64 - include # Paddle Lite 头文件 - lib # Paddle Lite 库文件 - nvidia_tensorrt # NNAdapter 运行时库、device HAL 库 - libnnadapter.so # NNAdapter 运行时库 - libnvdia_tensorrt.so # NNAdapter device HAL 库 - libiomp5.so # Intel OpenMP 库 - libmklml_intel.so # Intel MKL 库 - libmklml_gnu.so # GNU MKL 库 - libpaddle_full_api_shared.so # 预编译 Paddle Lite full api 库 - libpaddle_light_api_shared.so # 预编译 Paddle Lite light api 库 - arm64 - include # Paddle Lite 头文件 - lib # Paddle Lite 库文件 - nvidia_tensorrt # NNAdapter 运行时库、device HAL 库 - libnnadapter.so # NNAdapter 运行时库 - libnvdia_tensorrt.so # NNAdapter device HAL 库 - libpaddle_full_api_shared.so # 预编译 Paddle Lite full api 库 - libpaddle_light_api_shared.so # 预编译 Paddle Lite light api 库 - armhf ... - OpenCV # OpenCV 预编译库 - ssd_detection_demo # 基于 ssd 的目标检测示例程序 - yolo_detection_demo # 基于 yolo 的目标检测示例程序
PaddleLite + TensorRT 运行时 context 相关选项设置
设备选择
NVIDIA_TENSORRT_DEVICE_TYPE # 设置 device 类型, 默认 GPU
NVIDIA_TENSORRT_DEVICE_TYPE=GPU # 选择 GPU 设备进行推理
NVIDIA_TENSORRT_DEVICE_TYPE=DLA # 选择 DLA 设备进行推理
设备号选择
NVIDIA_TENSORRT_DEVICE_ID # 设置 device id, 默认 0
NVIDIA_TENSORRT_DEVICE_ID=0 # 选择 DEVICE_TYPE 第 0 个设备
精度选择
NVIDIA_TENSORRT_PRECISION # 设置精度, 默认 float32
NVIDIA_TENSORRT_PRECISION=int8 # int8 精度进行推理
NVIDIA_TENSORRT_PRECISION=float16 # float16 精度进行推理
NVIDIA_TENSORRT_PRECISION=float32 # float32 精度进行推理
Int8 精度校准
NVIDIA_TENSORRT_CALIBRATION_TABLE_PATH # 设置 calibration 后生成 calibration table 的路径
NVIDIA_TENSORRT_CALIBRATION_DATASET_PATH # 设置 calibration 所需的数据集路径
该路径下文件已如下方式进行组织:
Image1.raw # 第 1 张图片预处理后的 raw 数据 Image2.raw # 第 2 张图片预处理后的 raw 数据 Image3.raw # 第 3 张图片预处理后的 raw 数据 ... ... ImageN.raw # 第 N 张图片预处理后的 raw 数据 lists.txt # 校准所需 raw 数据列表
其中 lists.txt 内容格式如下:
Image1.raw # 第 1 个 batch 校准所需的 raw 数据文件名称 Image2.raw # 第 2 个 batch 校准所需的 raw 数据文件名称 Image3.raw # 第 3 个 batch 校准所需的 raw 数据文件名称 ... ... ImageN.raw # 第 N 个 batch 校准所需的 raw 数据文件名称
Demo 中编译库版本须知:
Demo 中的 libs/PaddleLite/linux/arm64/lib/nvidia_tensorrt/libnvidia_tensorrt.so 是基于 Jetpack 4.4 + CUDA 10.2 + cuDNN 8.0 + TensorRT 7.1.3.0 在 Jetson AGX Xavier 环境设备上编译的。
Demo 中的 libs/PaddleLite/linux/amd64/lib/nvidia_tensorrt/libnvidia_tensorrt.so 是基于 CUDA 10.2 + cuDNN 8.0 + TensorRT 7.1.3.4 在 Quadro RTX 4000 环境设备上编译的。
如果用户环境不同,请参考『更新支持英伟达 TensorRT 的 Paddle Lite 库』章节进行编译和替换。
运行图像分类示例程序¶
进入
PaddleLite-generic-demo/image_classification_demo/shell/
执行以下命令比较 ResNet50 模型的性能和结果;
float32 精度推理(默认)
# For Jetson AGX Xavier arm64 (Arm cpu only) $ ./run.sh resnet50_fp32_224 linux arm64 cpu warmup: 1 repeat: 5, average: 289.136005 ms, max: 295.342010 ms, min: 285.328003 ms results: 3 Top0 tabby, tabby cat - 0.739791 Top1 tiger cat - 0.130986 Top2 Egyptian cat - 0.101033 Preprocess time: 0.706000 ms Prediction time: 289.136005 ms Postprocess time: 0.315000 ms (Arm cpu + TensorRT) # CUDA 10.2 | cuDNN 8.0 | TensorRT 7.1.3.0 # 注: 如果软件包版本和 Demo 中使用不一致需要重新编译 Paddle Lite 库, 请参考章节 "更新支持英伟达 TensorRT 的 Paddle Lite 库" # 默认使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 float32 的精度进行推理 $ ./run.sh resnet50_fp32_224 linux arm64 nvidia_tensorrt warmup: 1 repeat: 5, average: 9.197000 ms, max: 9.224000 ms, min: 9.147000 ms results: 3 Top0 tabby, tabby cat - 0.739792 Top1 tiger cat - 0.130985 Top2 Egyptian cat - 0.101032 Preprocess time: 0.698000 ms Prediction time: 9.197000 ms Postprocess time: 0.313000 ms # For RTX4000 amd64 (Intel cpu only) $ ./run.sh resnet50_fp32_224 linux amd64 cpu warmup: 1 repeat: 5, average: 192.425604 ms, max: 215.518005 ms, min: 176.852005 ms results: 3 Top0 tabby, tabby cat - 0.739791 Top1 tiger cat - 0.130985 Top2 Egyptian cat - 0.101033 Preprocess time: 0.947000 ms Prediction time: 192.425604 ms Postprocess time: 0.245000 ms (Intel cpu + TensorRT) # CUDA 10.2 | cuDNN 8.0 | TensorRT 7.1.3.4 # 注: 如果软件包版本和 Demo 中使用不一致需要重新编译 Paddle Lite 库, 请参考章节 "更新支持英伟达 TensorRT 的 Paddle Lite 库" # 默认使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 float32 的精度进行推理 $ ./run.sh resnet50_fp32_224 linux amd64 nvidia_tensorrt warmup: 1 repeat: 5, average: 4.760800 ms, max: 4.800000 ms, min: 4.717000 ms results: 3 Top0 tabby, tabby cat - 0.739792 Top1 tiger cat - 0.130985 Top2 Egyptian cat - 0.101033 Preprocess time: 1.022000 ms Prediction time: 4.760800 ms Postprocess time: 0.261000 ms
float16 精度推理:
# 使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 float16 的精度进行推理 # 执行结果: # For Jetson AGX Xavier arm64 $ ./run.sh resnet50_fp32_224 linux arm64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=float16;" warmup: 1 repeat: 5, average: 3.530800 ms, max: 3.578000 ms, min: 3.390000 ms results: 3 Top0 tabby, tabby cat - 0.740723 Top1 tiger cat - 0.129761 Top2 Egyptian cat - 0.101074 Preprocess time: 0.704000 ms Prediction time: 3.530800 ms Postprocess time: 0.302000 ms # For RTX4000 amd64 $ ./run.sh resnet50_fp32_224 linux amd64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=float16;" warmup: 1 repeat: 5, average: 1.952600 ms, max: 2.087000 ms, min: 1.858000 ms results: 3 Top0 tabby, tabby cat - 0.741211 Top1 tiger cat - 0.129883 Top2 Egyptian cat - 0.100342 Preprocess time: 0.979000 ms Prediction time: 1.952600 ms Postprocess time: 0.251000 ms
int8 精度推理:
# 使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 int8 的精度进行推理 # 下载 calibration 所需数据集 $ curl https://paddlelite-demo.bj.bcebos.com/devices/nvidia_tensorrt/datasets/imagenet_raw_1000.tar.gz -o -| tar -xz -C ../assets/ # ImageNet 验证数据集前 1000 张图片经过预处理后的 raw 数据 # 执行结果: # For Jetson AGX Xavier arm64 $ ./run.sh resnet50_fp32_224 linux arm64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=int8;NVIDIA_TENSORRT_CALIBRATION_DATASET_PATH=../assets/imagenet_raw_1000;NVIDIA_TENSORRT_CALIBRATION_TABLE_PATH=../assets/models/resnet50_fp32_224/calibration_table;" warmup: 1 repeat: 5, average: 2.674600 ms, max: 2.846000 ms, min: 2.587000 ms results: 3 Top0 tabby, tabby cat - 0.742700 Top1 tiger cat - 0.150497 Top2 Egyptian cat - 0.078266 Preprocess time: 0.655000 ms Prediction time: 2.674600 ms Postprocess time: 0.304000 ms # For RTX4000 amd64 $ ./run.sh resnet50_fp32_224 linux amd64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=int8;NVIDIA_TENSORRT_CALIBRATION_DATASET_PATH=../assets/imagenet_raw_1000;NVIDIA_TENSORRT_CALIBRATION_TABLE_PATH=../assets/models/resnet50_fp32_224/calibration_table;" warmup: 1 repeat: 5, average: 1.646200 ms, max: 1.724000 ms, min: 1.582000 ms results: 3 Top0 tabby, tabby cat - 0.735309 Top1 tiger cat - 0.153960 Top2 Egyptian cat - 0.080652 Preprocess time: 0.943000 ms Prediction time: 1.646200 ms Postprocess time: 0.246000 ms
测试图片设置
如需更改测试图片,请将图片拷贝到
PaddleLite-generic-demo/image_classification_demo/assets/images
目录下,修改并执行convert_to_raw_image.py
生成相应的 RGB Raw 图像,最后修改run.sh
的 IMAGE_NAME 即可
运行目标检测示例程序¶
进入
PaddleLite-generic-demo/yolo_detection_demo/shell/
执行以下命令比较 yolov3_darknet53_270e_coco_fp32_608 模型的性能和结果
float32 精度推理(默认)
# For Jetson AGX Xavier arm64 (Arm cpu only) $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux arm64 cpu warmup: 1 repeat: 5, average: 3340.893359 ms, max: 3370.016113 ms, min: 3315.264893 ms results: 3 [0] bicycle - 0.994217 96.105804,134.768875,452.231476,429.883850 [1] truck - 0.822486 371.336975,82.459366,542.959106,179.119370 [2] dog - 0.987700 101.067924,238.172989,249.511230,566.703308 Preprocess time: 3.220000 ms Prediction time: 3340.893359 ms Postprocess time: 0.012000 ms (Arm cpu + TensorRT) # CUDA 10.2 | cuDNN 8.0 | TensorRT 7.1.3.0 # 注: 如果软件包版本和 Demo 中使用不一致需要重新编译 Paddle Lite 库, 请参考章节 "更新支持英伟达 TensorRT 的 Paddle Lite 库" # 默认使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 float32 的精度进行推理 $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux arm64 nvidia_tensorrt warmup: 1 repeat: 5, average: 108.054800 ms, max: 109.084999 ms, min: 106.007004 ms results: 3 [0] bicycle - 0.994217 96.105850,134.768967,452.231445,429.883728 [1] truck - 0.822484 371.336914,82.459389,542.959045,179.119354 [2] dog - 0.987700 101.067947,238.173141,249.511246,566.703125 Preprocess time: 4.250000 ms Prediction time: 108.054800 ms Postprocess time: 0.009000 ms # For RTX4000 amd64 (Intel cpu only) $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux amd64 cpu warmup: 1 repeat: 5, average: 2063.369971 ms, max: 2313.992920 ms, min: 1978.823975 ms results: 3 [0] bicycle - 0.994217 96.105835,134.768921,452.231445,429.883789 [1] truck - 0.822484 371.336914,82.459381,542.959045,179.119354 [2] dog - 0.987700 101.067955,238.173111,249.511230,566.703186 Preprocess time: 8.212000 ms Prediction time: 2063.369971 ms Postprocess time: 0.014000 ms (Intel cpu + TensorRT) # CUDA 10.2 | cuDNN 8.0 | TensorRT 7.1.3.4 # 注: 如果软件包版本和 Demo 中使用不一致需要重新编译 Paddle Lite 库, 请参考章节 "更新支持英伟达 TensorRT 的 Paddle Lite 库" # 默认使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 float32 的精度进行推理 $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux amd64 nvidia_tensorrt warmup: 1 repeat: 5, average: 20.190400 ms, max: 20.415001 ms, min: 20.035000 ms results: 3 [0] bicycle - 0.994138 96.253372,134.615204,451.998718,429.995758 [1] truck - 0.824422 371.321472,82.480606,542.967773,179.083664 [2] dog - 0.987714 101.088051,238.092957,249.499786,566.778992 Preprocess time: 7.380000 ms Prediction time: 20.190400 ms Postprocess time: 0.009000 ms
float16 精度推理:
# 使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 float16 的精度进行推理 # 执行结果: # For Jetson AGX Xavier arm64 $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux arm64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=float16;" warmup: 1 repeat: 5, average: 38.339001 ms, max: 38.834999 ms, min: 36.581001 ms results: 3 [0] bicycle - 0.994222 95.958069,134.931198,452.224915,429.843384 [1] truck - 0.819042 371.327850,82.497841,542.974121,179.136292 [2] dog - 0.987907 101.132500,238.003235,249.381256,566.689270 Preprocess time: 3.162000 ms Prediction time: 38.339001 ms Postprocess time: 0.011000 ms # For RTX4000 amd64 $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux amd64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=float16;" warmup: 1 repeat: 5, average: 21.748800 ms, max: 22.000000 ms, min: 21.627001 ms results: 3 [0] bicycle - 0.994138 96.253372,134.615204,451.998718,429.995758 [1] truck - 0.824422 371.321472,82.480606,542.967773,179.083664 [2] dog - 0.987714 101.088051,238.092957,249.499786,566.778992 Preprocess time: 6.977000 ms Prediction time: 21.748800 ms Postprocess time: 0.010000 ms
int8 精度推理:
# 使用 nvidia_tensorrt 在 GPU 的 第 0 个设备上以 int8 的精度进行推理 # 下载 calibration 所需数据集 $ curl https://paddlelite-demo.bj.bcebos.com/devices/nvidia_tensorrt/datasets/coco_raw_1000.tar.gz -o -| tar -xz -C ../assets/ # coco 数据集前 1000 张图片经过预处理后的 raw 数据 # 执行结果: # For Jetson AGX Xavier arm64 $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux arm64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=int8;NVIDIA_TENSORRT_CALIBRATION_DATASET_PATH=../assets/coco_raw_1000;NVIDIA_TENSORRT_CALIBRATION_TABLE_PATH=../assets/models/yolov3_darknet53_270e_coco_fp32_608/calibration_table;" warmup: 1 repeat: 5, average: 24.788400 ms, max: 24.900999 ms, min: 24.667999 ms results: 3 [0] bicycle - 0.980123 96.815704,132.817017,452.862457,429.673828 [1] truck - 0.563585 372.201050,83.083687,546.585938,179.713409 [2] dog - 0.961471 99.111099,239.197708,252.214218,566.569702 Preprocess time: 1.998000 ms Prediction time: 24.788400 ms Postprocess time: 0.010000 ms # For RTX4000 amd64 $ ./run.sh yolov3_darknet53_270e_coco_fp32_608 linux amd64 nvidia_tensorrt "NVIDIA_TENSORRT_DEVICE_TYPE=GPU;NVIDIA_TENSORRT_DEVICE_ID=0;NVIDIA_TENSORRT_PRECISION=int8;NVIDIA_TENSORRT_CALIBRATION_DATASET_PATH=../assets/coco_raw_1000;NVIDIA_TENSORRT_CALIBRATION_TABLE_PATH=../assets/models/yolov3_darknet53_270e_coco_fp32_608/calibration_table;" warmup: 1 repeat: 5, average: 17.169000 ms, max: 17.409000 ms, min: 16.837000 ms results: 3 [0] bicycle - 0.980123 96.815704,132.817017,452.862457,429.673828 [1] truck - 0.563585 372.201050,83.083687,546.585938,179.713409 [2] dog - 0.961471 99.111099,239.197708,252.214218,566.569702 Preprocess time: 7.698000 ms Prediction time: 17.169000 ms Postprocess time: 0.007000 ms
测试图片设置
如需更改测试图片,请将图片拷贝到
PaddleLite-generic-demo/yolo_detection_demo/assets/images
目录下,修改并执行convert_to_raw_image.py
生成相应的 RGB Raw 图像,最后修改run.sh
的 IMAGE_NAME 即可
更新支持英伟达 TensorRT 的 Paddle Lite 库¶
下载 Paddle Lite 源码
$ git clone https://github.com/PaddlePaddle/Paddle-Lite.git $ cd Paddle-Lite $ git checkout <release-version-tag>
编译并生成 PaddleLite+NNAdapter+TensorRT for amd64 and arm64 的部署库
For amd64
full_publish 编译
$ export NNADAPTER_NVIDIA_CUDA_ROOT="/usr/local/cuda" # 替换成自己环境的 cuda 路径 $ export NNADAPTER_NVIDIA_TENSORRT_ROOT="/usr/local/tensorrt" # 替换成自己环境的 tensorrt 路径 $ ./lite/tools/build_linux.sh --arch=x86 --with_extra=ON --with_log=ON --with_exception=ON --with_nnadapter=ON --nnadapter_with_nvidia_tensorrt=ON --nnadapter_nvidia_cuda_root=$NNADAPTER_NVIDIA_CUDA_ROOT --nnadapter_nvidia_tensorrt_root=$NNADAPTER_NVIDIA_TENSORRT_ROOT full_publish
替换头文件和库
# 清理原有 include 目录 $ rm -rf PaddleLite-generic-demo/libs/PaddleLite/linux/amd64/include/ # 替换 include 目录 $ cp -rf build.lite.linux.x86.gcc/inference_lite_lib/cxx/include/ PaddleLite-generic-demo/libs/PaddleLite/linux/amd64/include/ # 替换 NNAdapter 运行时库 $ cp build.lite.linux.x86.gcc/inference_lite_lib/cxx/lib/libnnadapter.so PaddleLite-generic-demo/libs/PaddleLite/linux/amd64/lib/nvidia_tensorrt/ # 替换 NNAdapter device HAL 库 $ cp build.lite.linux.x86.gcc/inference_lite_lib/cxx/lib/libnvidia_tensorrt.so PaddleLite-generic-demo/libs/PaddleLite/linux/amd64/lib/nvidia_tensorrt/ # 替换 libpaddle_full_api_shared.so $ cp build.lite.linux.x86.gcc/inference_lite_lib/cxx/lib/libpaddle_full_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/linux/amd64/lib/ # 替换 libpaddle_light_api_shared.so $ cp build.lite.linux.x86.gcc/inference_lite_lib/cxx/lib/libpaddle_light_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/linux/amd64/lib/
For Jetson
full_publish 编译
$ export NNADAPTER_NVIDIA_CUDA_ROOT="/usr/local/cuda" # 替换成自己环境的 cuda 路径 $ export NNADAPTER_NVIDIA_TENSORRT_ROOT="/usr/local/tensorrt" # 替换成自己环境的 tensorrt 路径 $ ./lite/tools/build_linux.sh --arch=armv8 --with_extra=ON --with_log=ON --with_exception=ON --with_nnadapter=ON --nnadapter_with_nvidia_tensorrt=ON --nnadapter_nvidia_cuda_root=$NNADAPTER_NVIDIA_CUDA_ROOT --nnadapter_nvidia_tensorrt_root=$NNADAPTER_NVIDIA_TENSORRT_ROOT full_publish
替换头文件和库
# 清理原有 include 目录 $ rm -rf PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/include/ # 替换 include 目录 $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/include/ PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/include/ # 替换 NNAdapter 运行时库 $ cp build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libnnadapter.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/nvidia_tensorrt/ # 替换 NNAdapter device HAL 库 $ cp build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libnvidia_tensorrt.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/nvidia_tensorrt/ # 替换 libpaddle_full_api_shared.so $ cp build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libpaddle_full_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/ # 替换 libpaddle_light_api_shared.so $ cp build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libpaddle_light_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/
替换头文件后需要重新编译示例程序