\u200E
# 性能数据
可以参考[benchmark_tools](benchmark_tools),推荐**一键benchmark**。
## ARM测试环境
* 测试模型
* fp32模型
* mobilenet_v1
* mobilenet_v2
* squeezenet_v1.1
* mnasnet
* shufflenet_v2
* int8模型
* mobilenet_v1
* mobilenet_v2
* 测试机器(android ndk ndk-r17c)
* 骁龙855
* xiaomi mi9, snapdragon 855 (enable sdot instruction)
* 4xA76(1@2.84GHz + 3@2.4GHz) + 4xA55@1.78GHz
* 骁龙845
* xiaomi mi8, 845
* 2.8GHz(大四核),1.7GHz(小四核)
* 骁龙835
* xiaomi mix2, snapdragon 835
* 2.45GHz(大四核),1.9GHz(小四核)
* 麒麟970
* HUAWEI Mate10
* 测试说明
* branch: release/v2.6.0
* warmup=10, repeats=30,统计平均时间,单位是ms
* 当线程数为1时,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH,否者设置LITE_POWER_NO_BIND
* 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1
## ARM测试数据
### fp32模型测试数据
#### paddlepaddle model
骁龙855|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4
mobilenet_v1 |35.11 |20.67 |11.83 |30.56 |18.59 |10.44 |
mobilenet_v2 |26.36 |15.83 |9.29 |21.64 |13.25 |7.95 |
shufflenet_v2 |4.56 |3.14 |2.35 |4.07 |2.89 |2.28 |
squeezenet_v1.1 |21.27 |13.55 |8.49 |18.05 |11.51 |7.83 |
mnasnet |21.40 |13.18 |7.63 |18.84 |11.40 |6.80 |
骁龙845|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4
mobilenet_v1 |65.56 |37.17 |19.65 |63.23 |32.98 |17.68 |
mobilenet_v2 |45.89 |25.20 |14.39 |41.03 |22.94 |12.98 |
shufflenet_v2 |7.31 |4.66 |3.27 |7.08 |4.71 |3.41 |
squeezenet_v1.1 |36.98 |22.53 |13.45 |34.27 |20.96 |12.60 |
mnasnet |39.85 |23.64 |12.25 |37.81 |20.70 |11.81 |
骁龙835|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4
mobilenet_v1 |92.77 |51.56 |30.14 |87.46 |48.02 |26.42 |
mobilenet_v2 |65.78 |36.52 |22.34 |58.31 |33.04 |19.87 |
shufflenet_v2 |10.39 |6.26 |4.46 |9.72 |6.19 |4.41 |
squeezenet_v1.1 |53.59 |33.16 |20.13 |51.56 |31.81 |19.10 |
mnasnet |57.44 |32.62 |19.47 |54.99 |30.69 |17.98 |
#### caffe model
骁龙855|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4 |
mobilenet_v1 |32.38 |18.65 |10.69 |30.75 |18.11 |9.88 |
mobilenet_v2 |29.45 |17.86 |10.81 |26.61 |16.26 |9.67 |
shufflenet_v2 |5.04 |3.14 |2.20 |4.09 |2.85 |2.25 |
骁龙845|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4 |
mobilenet_v1 |65.26 |35.19 |19.11 |61.42 |33.15 |17.48 |
mobilenet_v2 |55.59 |31.31 |17.68 |51.54 |29.69 |16.00 |
shufflenet_v2 |7.42 |4.73 |3.33 |7.18 |4.75 |3.39 |
骁龙835|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4 |
mobilenet_v1 |95.38 |52.16 |30.37 |92.10 |46.71 |26.31 |
mobilenet_v2 |82.89 |45.49 |28.14 |74.91 |41.88 |25.25 |
shufflenet_v2 |10.25 |6.36 |4.42 |9.68 |6.20 |4.42 |
#### int8量化模型测试数据
骁龙855|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4 |
mobilenet_v1 |37.18 |21.71 |11.16 | 14.41 |8.34 |4.37 |
mobilenet_v2 |27.95 |16.57 |8.97 | 13.68 |8.16 |4.67 |
骁龙835|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4 |
mobilenet_v1 |61.63 |32.60 |16.49 |57.36 |29.74 |15.50 |
mobilenet_v2 |47.13 |25.62 |13.56 |41.87 |22.42 |11.72 |
麒麟970|armv7 | armv7 | armv7 |armv8 | armv8 |armv8
----| ---- | ---- | ---- | ---- |---- |----
threads num|1 |2 |4 |1 |2 |4 |
mobilenet_v1 |63.13 |32.63 |16.85 |58.92 |29.96 |15.42 |
mobilenet_v2 |48.60 |25.43 |13.76 |43.06 |22.10 |12.09 |
## 华为麒麟NPU测试环境
* 测试模型
* fp32模型
* mobilenet_v1
* mobilenet_v2
* squeezenet_v1.1
* mnasnet
* 测试机器(android ndk ndk-r17c)
* 麒麟810
* HUAWEI Nova5, Kirin 810
* 2xCortex A76 2.27GHz + 6xCortex A55 1.88GHz
* 麒麟990
* HUAWEI Mate 30, Kirin 990
* 2 x Cortex-A76 Based 2.86 GHz + 2 x Cortex-A76 Based 2.09 GHz + 4 x Cortex-A55 1.86 GHz
* 麒麟990 5G
* HUAWEI P40, Kirin 990 5G
* 2 x Cortex-A76 Based 2.86GHz + 2 x Cortex-A76 Based 2.36GHz + 4 x Cortex-A55 1.95GHz
* HIAI ddk 版本: 310 or 320
* 测试说明
* branch: release/v2.6.1
* warmup=10, repeats=30,统计平均时间,单位是ms
* 线程数为1,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH
* 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1
## 华为麒麟NPU测试数据
#### paddlepaddle model
- ddk 310
|Kirin |810||990||990 5G||
|---|---|---|---|---|---|---|
| |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |
|mobilenet_v1| 41.20| 12.76| 31.91| 4.07| 33.97| 3.20|
|mobilenet_v2| 29.57| 12.12| 22.47| 5.61| 23.17| 3.51|
|squeezenet| 23.96| 9.04| 17.79| 3.82| 18.65| 3.01|
|mnasnet| 26.47| 13.62| 19.54| 5.17| 20.34| 3.32|
- ddk 320
|模型 |990||990-5G||
|---|---|---|---|---|
||cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |
|ssd_mobilenetv1| 65.67| 18.21| 71.8| 16.6|
*说明:ssd_mobilenetv1的npu性能为npu、cpu混合调度运行的总时间*