Runtime Option(Runtime配置)#

fastdeploy.RuntimeOption#

class fastdeploy.RuntimeOption[source]#

Options for FastDeploy Runtime.

Initialize a FastDeploy RuntimeOption object.

delete_paddle_backend_pass(pass_name)[source]#

Delete pass by name in paddle backend

disable_lite_fp16()[source]#

Disable half precision inference while using Paddle Lite backend on ARM CPU, fp16 is disabled by default.

disable_paddle_log_info()[source]#

Disable print out the debug log information while using Paddle Inference backend, the log information is disabled by default.

disable_paddle_trt_collect_shape()[source]#

Disable collect subgraph shape information while using Paddle Inference with TensorRT

disable_paddle_trt_ops(ops)[source]#

Disable some ops in paddle trt backend

disable_pinned_memory()[source]#

Disable pinned memory.

disable_profiling()[source]#

Set the profile mode as ‘false’.

disable_trt_fp16()[source]#

Disable half precision inference while suing TensorRT backend.

enable_lite_fp16()[source]#

Enable half precision inference while using Paddle Lite backend on ARM CPU, fp16 is disabled by default.

enable_paddle_log_info()[source]#

Enable print out the debug log information while using Paddle Inference backend, the log information is disabled by default.

enable_paddle_to_trt()[source]#

While using TensorRT backend, enable_paddle_to_trt() will change to use Paddle Inference backend, and use its integrated TensorRT instead.

enable_paddle_trt_collect_shape()[source]#

Enable collect subgraph shape information while using Paddle Inference with TensorRT

enable_pinned_memory()[source]#

Enable pinned memory. Pinned memory can be utilized to speedup the data transfer between CPU and GPU. Currently it’s only suppurted in TRT backend and Paddle Inference backend.

enable_profiling(inclue_h2d_d2h=False, repeat=100, warmup=50)[source]#

Set the profile mode as ‘true’. :param inclue_h2d_d2h Whether to include time of H2D_D2H for time of runtime. :param repeat Repeat times for runtime inference. :param warmup Warmup times for runtime inference.

enable_trt_fp16()[source]#

Enable half precision inference while using TensorRT backend, notice that not all the Nvidia GPU support FP16, in those cases, will fallback to FP32 inference.

property openvino_option#

Get OpenVINOOption object to configure OpenVINO backend

:return OpenVINOOption

property ort_option#

Get OrtBackendOption object to configure ONNX Runtime backend

:return OrtBackendOption

property paddle_infer_option#

Get PaddleBackendOption object to configure Paddle Inference backend

:return PaddleBackendOption

property paddle_lite_option#

Get LiteBackendOption object to configure Paddle Lite backend

:return LiteBackendOption

property poros_option#

Get PorosBackendOption object to configure Poros backend

:return PorosBackendOption

set_cpu_thread_num(thread_num=-1)[source]#

Set number of threads if inference with CPU

Parameters

thread_num – (int)Number of threads, if not positive, means the number of threads is decided by the backend, default -1

set_encryption_key(encryption_key)[source]#

When loading encrypted model, encryption_key is required to decrypte model :param encryption_key: (str)The key for decrypting model

set_external_raw_stream(cuda_stream)[source]#

Set the external raw stream used by fastdeploy runtime.

set_lite_context_properties(context_properties)[source]#

Set nnadapter context properties for Paddle Lite backend.

set_lite_dynamic_shape_info(dynamic_shape_info)[source]#

Set nnadapter dynamic shape info for Paddle Lite backend.

set_lite_mixed_precision_quantization_config_path(mixed_precision_quantization_config_path)[source]#

Set nnadapter mixed precision quantization config path for Paddle Lite backend..

set_lite_model_cache_dir(model_cache_dir)[source]#

Set nnadapter model cache dir for Paddle Lite backend.

set_lite_power_mode(mode)[source]#

Set POWER mode while using Paddle Lite backend on ARM CPU.

set_lite_subgraph_partition_config_buffer(subgraph_partition_buffer)[source]#

Set nnadapter subgraph partition buffer for Paddle Lite backend.

set_lite_subgraph_partition_path(subgraph_partition_path)[source]#

Set nnadapter subgraph partition path for Paddle Lite backend.

set_model_buffer(model_buffer, params_buffer='', model_format=<ModelFormat.PADDLE: 1>)[source]#

Specify the memory buffer of model and parameter. Used when model and params are loaded directly from memory :param model_buffer: (bytes)The memory buffer of model :param params_buffer: (bytes)The memory buffer of the parameters :param model_format: (ModelFormat)Format of model, support ModelFormat.PADDLE/ModelFormat.ONNX/ModelFormat.TORCHSCRIPT

set_model_path(model_path, params_path='', model_format=<ModelFormat.PADDLE: 1>)[source]#

Set path of model file and parameters file

Parameters
  • model_path – (str)Path of model file

  • params_path – (str)Path of parameters file

  • model_format – (ModelFormat)Format of model, support ModelFormat.PADDLE/ModelFormat.ONNX/ModelFormat.TORCHSCRIPT

set_openvino_cpu_operators(operators)[source]#
While using OpenVINO backend and intel GPU, this interface specifies unsupported operators to run on CPU

This interface is deprecated, please use RuntimeOption.openvino_option.set_cpu_operators instead.

Parameters

operators – (list of string)list of operators’ name, e.g [“MulticlasNms”]

set_openvino_device(name='CPU')[source]#

Set device name for OpenVINO, default ‘CPU’, can also be ‘AUTO’, ‘GPU’, ‘GPU.1’…. This interface is deprecated, please use RuntimeOption.openvino_option.set_device instead.

set_openvino_shape_info(shape_info)[source]#
Set shape information of the models’ inputs, used for GPU to fix the shape

This interface is deprecated, please use RuntimeOption.openvino_option.set_shape_info instead.

Parameters

shape_info – (dict{str, list of int})Shape information of model’s inputs, e.g {“image”: [1, 3, 640, 640], “scale_factor”: [1, 2]}

set_ort_graph_opt_level(level=-1)[source]#

Set graph optimization level for ONNX Runtime backend

Parameters

level – (int)Optimization level, -1 means the default setting

set_paddle_mkldnn(use_mkldnn=True)[source]#

Enable/Disable MKLDNN while using Paddle Inference backend, mkldnn is enabled by default.

set_paddle_mkldnn_cache_size(cache_size)[source]#

Set size of shape cache while using Paddle Inference backend with MKLDNN enabled, default will cache all the dynamic shape.

set_trt_cache_file(cache_file_path)[source]#

Set a cache file path while using TensorRT backend. While loading a Paddle/ONNX model with set_trt_cache_file(“./tensorrt_cache/model.trt”), if file ./tensorrt_cache/model.trt exists, it will skip building tensorrt engine and load the cache file directly; if file ./tensorrt_cache/model.trt doesn’t exist, it will building tensorrt engine and save the engine as binary string to the cache file.

Parameters

cache_file_path – (str)Path of tensorrt cache file

set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)[source]#

Set shape range information while using TensorRT backend with loadding a model contains dynamic input shape. While inference with a new input shape out of the set shape range, the tensorrt engine will be rebuilt to expand the shape range information.

Parameters
  • tensor_name – (str)Name of input which has dynamic shape

  • min_shape – (list of int)Minimum shape of the input, e.g [1, 3, 224, 224]

  • opt_shape – (list of int)Optimize shape of the input, this offten set as the most common input shape, if set to None, it will keep same with min_shape

  • max_shape – (list of int)Maximum shape of the input, e.g [8, 3, 224, 224], if set to None, it will keep same with the min_shape

set_trt_max_batch_size(trt_max_batch_size)[source]#

Set max batch size while using TensorRT backend.

set_trt_max_workspace_size(trt_max_workspace_size)[source]#

Set max workspace size while using TensorRT backend.

property trt_option#

Get TrtBackendOption object to configure TensorRT backend

:return TrtBackendOption

use_ascend()[source]#

Inference with Huawei Ascend NPU

use_cpu()[source]#

Inference with CPU

use_gpu(device_id=0)[source]#

Inference with Nvidia GPU

Parameters

device_id – (int)The index of GPU will be used for inference, default 0

use_kunlunxin(device_id=0, l3_workspace_size=16777216, locked=False, autotune=True, autotune_file='', precision='int16', adaptive_seqlen=False, enable_multi_stream=False)[source]#

Inference with KunlunXin XPU

Parameters
  • device_id – (int)The index of KunlunXin XPU will be used for inference, default 0

  • l3_workspace_size – (int)The size of the video memory allocated by the l3 cache, the maximum is 16M, default 16M

  • locked – (bool)Whether the allocated L3 cache can be locked. If false, it means that the L3 cache is not locked, and the allocated L3 cache can be shared by multiple models, and multiple models

  • autotune – (bool)Whether to autotune the conv operator in the model. If true, when the conv operator of a certain dimension is executed for the first time, it will automatically search for a better algorithm to improve the performance of subsequent conv operators of the same dimension.

  • autotune_file – (str)Specify the path of the autotune file. If autotune_file is specified, the algorithm specified in the file will be used and autotune will not be performed again.

  • precision – (str)Calculation accuracy of multi_encoder

  • adaptive_seqlen – (bool)adaptive_seqlen Is the input of multi_encoder variable length

  • enable_multi_stream – (bool)Whether to enable the multi stream of KunlunXin XPU.

use_lite_backend()[source]#

Use Paddle Lite backend, support inference Paddle model on ARM CPU.

use_openvino_backend()[source]#

Use OpenVINO backend, support inference Paddle/ONNX model on CPU.

use_ort_backend()[source]#

Use ONNX Runtime backend, support inference Paddle/ONNX model on CPU/Nvidia GPU.

use_paddle_backend()[source]#

Use Paddle Inference backend, support inference Paddle model on CPU/Nvidia GPU.

use_paddle_infer_backend()[source]#

Wrapper function of use_paddle_backend(), use Paddle Inference backend, support inference Paddle model on CPU/Nvidia GPU.

use_paddle_lite_backend()[source]#

Wrapper function of use_lite_backend(), use Paddle Lite backend, support inference Paddle model on ARM CPU.

use_poros_backend()[source]#

Use Poros backend, support inference TorchScript model on CPU/Nvidia GPU.

use_sophgo()[source]#

Inference with SOPHGO TPU

use_trt_backend()[source]#

Use TensorRT backend, support inference Paddle/ONNX model on Nvidia GPU.