用 TVMC 编译和优化模型
导航
用 TVMC 编译和优化模型#
原作者:Leandro Nunes, Matthew Barrett, Chris Hoge
在本节中,将使用 TVMC,即 TVM 命令行驱动程序。TVMC 工具,它暴露了 TVM 的功能,如 auto-tuning、编译、profiling 和通过命令行界面执行模型。
在完成本节内容后,将使用 TVMC 来完成以下任务:
为 TVM 运行时编译预训练 ResNet-50 v2 模型。
通过编译后的模型运行真实图像,并解释输出和模型的性能。
使用 TVM 在 CPU 上调优模型。
使用 TVM 收集的调优数据重新编译优化模型。
通过优化后的模型运行图像,并比较输出和模型的性能。
本节的目的是让你了解 TVM 和 TVMC 的能力,并为理解 TVM 的工作原理奠定基础。
使用 TVMC#
TVMC 是 Python 应用程序,是 TVM Python 软件包的一部分。当你使用 Python 包安装 TVM 时,你将得到 TVMC 作为命令行应用程序,名为 tvmc
。这个命令的位置将取决于你的平台和安装方法。
另外,如果你在 $PYTHONPATH
上将 TVM 作为 Python 模块,你可以通过可执行的 python 模块 python -m tvm.driver.tvmc
访问命令行驱动功能。
为简单起见,本教程将提到 TVMC 命令行使用 tvmc <options>
,但同样的结果可以用 python -m tvm.driver.tvmc <options>
。
你可以使用帮助页面查看:
!python -m tvm.driver.tvmc --help
usage: tvmc [--config CONFIG] [-v] [--version] [-h]
{micro,run,tune,compile} ...
TVM compiler driver
options:
--config CONFIG configuration json file
-v, --verbose increase verbosity
--version print the version and exit
-h, --help show this help message and exit.
commands:
{micro,run,tune,compile}
micro select micro context.
run run a compiled module
tune auto-tune a model
compile compile a model.
TVMC - TVM driver command-line interface
tvmc
可用的 TVM 的主要功能来自子命令 compile
和 run
,以及 tune
。要了解某个子命令下的具体选项,请使用 tvmc <subcommand> --help
。将在本教程中逐一介绍这些命令,但首先需要下载预训练模型来使用。
获得模型#
在本教程中,将使用 ResNet-50 v2。ResNet-50 是卷积神经网络,有 50 层深度,设计用于图像分类。将使用的模型已经在超过一百万张图片上进行了预训练,有 1000 种不同的分类。该网络输入图像大小为 224x224。如果你有兴趣探究更多关于 ResNet-50 模型的结构,建议下载 `Netron,它免费提供的 ML 模型查看器。
在本教程中,将使用 ONNX 格式的模型。
!wget https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx
--2022-04-26 13:07:52-- https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://media.githubusercontent.com/media/onnx/models/main/vision/classification/resnet/model/resnet50-v2-7.onnx [following]
--2022-04-26 13:07:53-- https://media.githubusercontent.com/media/onnx/models/main/vision/classification/resnet/model/resnet50-v2-7.onnx
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102442450 (98M) [application/octet-stream]
Saving to: ‘resnet50-v2-7.onnx’
resnet50-v2-7.onnx 100%[===================>] 97.70M 4.51MB/s in 25s
2022-04-26 13:08:27 (3.89 MB/s) - ‘resnet50-v2-7.onnx’ saved [102442450/102442450]
为了让该模型可以被其他教程使用,需要:
!mv resnet50-v2-7.onnx ../../_models/resnet50-v2-7.onnx
支持的模型格式
TVMC 支持用 Keras、ONNX、TensorFlow、TFLite 和 Torch 创建的模型。如果你需要明确地提供你所使用的模型格式,请使用选项 --model-format
。
更多信息见:
!python -m tvm.driver.tvmc compile --help
usage: tvmc compile [-h] [--cross-compiler CROSS_COMPILER]
[--cross-compiler-options CROSS_COMPILER_OPTIONS]
[--desired-layout {NCHW,NHWC}] [--dump-code FORMAT]
[--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
[-o OUTPUT] [-f {so,mlf}] [--pass-config name=value]
[--target TARGET]
[--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
[--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
[--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
[--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
[--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
[--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
[--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
[--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
[--target-ext_dev-model TARGET_EXT_DEV_MODEL]
[--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
[--target-ext_dev-tag TARGET_EXT_DEV_TAG]
[--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
[--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
[--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
[--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
[--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
[--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
[--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
[--target-llvm-mattr TARGET_LLVM_MATTR]
[--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
[--target-llvm-libs TARGET_LLVM_LIBS]
[--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
[--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
[--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
[--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
[--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
[--target-llvm-tag TARGET_LLVM_TAG]
[--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
[--target-llvm-model TARGET_LLVM_MODEL]
[--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
[--target-llvm-mcpu TARGET_LLVM_MCPU]
[--target-llvm-device TARGET_LLVM_DEVICE]
[--target-llvm-runtime TARGET_LLVM_RUNTIME]
[--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
[--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
[--target-llvm-mabi TARGET_LLVM_MABI]
[--target-llvm-keys TARGET_LLVM_KEYS]
[--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
[--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
[--target-hybrid-libs TARGET_HYBRID_LIBS]
[--target-hybrid-model TARGET_HYBRID_MODEL]
[--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
[--target-hybrid-tag TARGET_HYBRID_TAG]
[--target-hybrid-device TARGET_HYBRID_DEVICE]
[--target-hybrid-keys TARGET_HYBRID_KEYS]
[--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
[--target-aocl-libs TARGET_AOCL_LIBS]
[--target-aocl-model TARGET_AOCL_MODEL]
[--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
[--target-aocl-tag TARGET_AOCL_TAG]
[--target-aocl-device TARGET_AOCL_DEVICE]
[--target-aocl-keys TARGET_AOCL_KEYS]
[--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
[--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
[--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
[--target-nvptx-libs TARGET_NVPTX_LIBS]
[--target-nvptx-model TARGET_NVPTX_MODEL]
[--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
[--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
[--target-nvptx-tag TARGET_NVPTX_TAG]
[--target-nvptx-mcpu TARGET_NVPTX_MCPU]
[--target-nvptx-device TARGET_NVPTX_DEVICE]
[--target-nvptx-keys TARGET_NVPTX_KEYS]
[--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
[--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
[--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
[--target-opencl-libs TARGET_OPENCL_LIBS]
[--target-opencl-model TARGET_OPENCL_MODEL]
[--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
[--target-opencl-tag TARGET_OPENCL_TAG]
[--target-opencl-device TARGET_OPENCL_DEVICE]
[--target-opencl-keys TARGET_OPENCL_KEYS]
[--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
[--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
[--target-metal-from_device TARGET_METAL_FROM_DEVICE]
[--target-metal-libs TARGET_METAL_LIBS]
[--target-metal-keys TARGET_METAL_KEYS]
[--target-metal-model TARGET_METAL_MODEL]
[--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
[--target-metal-tag TARGET_METAL_TAG]
[--target-metal-device TARGET_METAL_DEVICE]
[--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
[--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
[--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
[--target-webgpu-libs TARGET_WEBGPU_LIBS]
[--target-webgpu-model TARGET_WEBGPU_MODEL]
[--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
[--target-webgpu-tag TARGET_WEBGPU_TAG]
[--target-webgpu-device TARGET_WEBGPU_DEVICE]
[--target-webgpu-keys TARGET_WEBGPU_KEYS]
[--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
[--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
[--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
[--target-rocm-libs TARGET_ROCM_LIBS]
[--target-rocm-mattr TARGET_ROCM_MATTR]
[--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-rocm-model TARGET_ROCM_MODEL]
[--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
[--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
[--target-rocm-tag TARGET_ROCM_TAG]
[--target-rocm-device TARGET_ROCM_DEVICE]
[--target-rocm-mcpu TARGET_ROCM_MCPU]
[--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK]
[--target-rocm-keys TARGET_ROCM_KEYS]
[--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
[--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
[--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
[--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
[--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
[--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
[--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
[--target-vulkan-libs TARGET_VULKAN_LIBS]
[--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
[--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
[--target-vulkan-mattr TARGET_VULKAN_MATTR]
[--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
[--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
[--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
[--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
[--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
[--target-vulkan-model TARGET_VULKAN_MODEL]
[--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
[--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
[--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
[--target-vulkan-tag TARGET_VULKAN_TAG]
[--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
[--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
[--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
[--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
[--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
[--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
[--target-vulkan-device TARGET_VULKAN_DEVICE]
[--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK]
[--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
[--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
[--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT]
[--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
[--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
[--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
[--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
[--target-vulkan-keys TARGET_VULKAN_KEYS]
[--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
[--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
[--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
[--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
[--target-cuda-arch TARGET_CUDA_ARCH]
[--target-cuda-libs TARGET_CUDA_LIBS]
[--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-cuda-model TARGET_CUDA_MODEL]
[--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
[--target-cuda-tag TARGET_CUDA_TAG]
[--target-cuda-device TARGET_CUDA_DEVICE]
[--target-cuda-mcpu TARGET_CUDA_MCPU]
[--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
[--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
[--target-cuda-keys TARGET_CUDA_KEYS]
[--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
[--target-sdaccel-libs TARGET_SDACCEL_LIBS]
[--target-sdaccel-model TARGET_SDACCEL_MODEL]
[--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
[--target-sdaccel-tag TARGET_SDACCEL_TAG]
[--target-sdaccel-device TARGET_SDACCEL_DEVICE]
[--target-sdaccel-keys TARGET_SDACCEL_KEYS]
[--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
[--target-composite-libs TARGET_COMPOSITE_LIBS]
[--target-composite-devices TARGET_COMPOSITE_DEVICES]
[--target-composite-model TARGET_COMPOSITE_MODEL]
[--target-composite-tag TARGET_COMPOSITE_TAG]
[--target-composite-device TARGET_COMPOSITE_DEVICE]
[--target-composite-keys TARGET_COMPOSITE_KEYS]
[--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
[--target-stackvm-libs TARGET_STACKVM_LIBS]
[--target-stackvm-model TARGET_STACKVM_MODEL]
[--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
[--target-stackvm-tag TARGET_STACKVM_TAG]
[--target-stackvm-device TARGET_STACKVM_DEVICE]
[--target-stackvm-keys TARGET_STACKVM_KEYS]
[--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
[--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
[--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
[--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
[--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
[--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
[--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
[--target-c-unpacked-api TARGET_C_UNPACKED_API]
[--target-c-from_device TARGET_C_FROM_DEVICE]
[--target-c-libs TARGET_C_LIBS]
[--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
[--target-c-executor TARGET_C_EXECUTOR]
[--target-c-link-params TARGET_C_LINK_PARAMS]
[--target-c-model TARGET_C_MODEL]
[--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
[--target-c-system-lib TARGET_C_SYSTEM_LIB]
[--target-c-tag TARGET_C_TAG]
[--target-c-interface-api TARGET_C_INTERFACE_API]
[--target-c-mcpu TARGET_C_MCPU]
[--target-c-device TARGET_C_DEVICE]
[--target-c-runtime TARGET_C_RUNTIME]
[--target-c-keys TARGET_C_KEYS]
[--target-c-march TARGET_C_MARCH]
[--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
[--target-hexagon-libs TARGET_HEXAGON_LIBS]
[--target-hexagon-mattr TARGET_HEXAGON_MATTR]
[--target-hexagon-model TARGET_HEXAGON_MODEL]
[--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
[--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
[--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
[--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
[--target-hexagon-device TARGET_HEXAGON_DEVICE]
[--target-hexagon-tag TARGET_HEXAGON_TAG]
[--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
[--target-hexagon-keys TARGET_HEXAGON_KEYS]
[--tuning-records PATH] [--executor EXECUTOR]
[--executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS]
[--executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT]
[--executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API]
[--executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API]
[--executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS]
[--runtime RUNTIME]
[--runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB]
[--runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB] [-v]
[-O [0-3]] [--input-shapes INPUT_SHAPES]
[--disabled-pass DISABLED_PASS]
[--module-name MODULE_NAME]
FILE
positional arguments:
FILE path to the input model file.
options:
-h, --help show this help message and exit
--cross-compiler CROSS_COMPILER
the cross compiler to generate target libraries, e.g.
'aarch64-linux-gnu-gcc'.
--cross-compiler-options CROSS_COMPILER_OPTIONS
the cross compiler options to generate target
libraries, e.g. '-mfpu=neon-vfpv4'.
--desired-layout {NCHW,NHWC}
change the data layout of the whole graph.
--dump-code FORMAT comma separated list of formats to export the input
model, e.g. 'asm,ll,relay'.
--model-format {keras,onnx,pb,tflite,pytorch,paddle}
specify input model format.
-o OUTPUT, --output OUTPUT
output the compiled module to a specified archive.
Defaults to 'module.tar'.
-f {so,mlf}, --output-format {so,mlf}
output format. Use 'so' for shared object or 'mlf' for
Model Library Format (only for microTVM targets).
Defaults to 'so'.
--pass-config name=value
configurations to be used at compile time. This option
can be provided multiple times, each one to set one
configuration value, e.g. '--pass-config
relay.backend.use_auto_scheduler=0', e.g. '--pass-
config
tir.add_lower_pass=opt_level1,pass1,opt_level2,pass2'.
--target TARGET compilation target as plain string, inline JSON or
path to a JSON file
--tuning-records PATH
path to an auto-tuning log file by AutoTVM. If not
presented, the fallback/tophub configs will be used.
--executor EXECUTOR Executor to compile the model with
--runtime RUNTIME Runtime to compile the model with
-v, --verbose increase verbosity.
-O [0-3], --opt-level [0-3]
specify which optimization level to use. Defaults to
'3'.
--input-shapes INPUT_SHAPES
specify non-generic shapes for model to run, format is
"input_name:[dim1,dim2,...,dimn]
input_name2:[dim1,dim2]".
--disabled-pass DISABLED_PASS
disable specific passes, comma-separated list of pass
names.
--module-name MODULE_NAME
The output module name. Defaults to 'default'.
target example_target_hook:
--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
target example_target_hook from_device
--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
target example_target_hook libs options
--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
target example_target_hook model string
--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
target example_target_hook tag string
--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
target example_target_hook device string
--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
target example_target_hook keys options
target ext_dev:
--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
target ext_dev from_device
--target-ext_dev-libs TARGET_EXT_DEV_LIBS
target ext_dev libs options
--target-ext_dev-model TARGET_EXT_DEV_MODEL
target ext_dev model string
--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
target ext_dev system-lib
--target-ext_dev-tag TARGET_EXT_DEV_TAG
target ext_dev tag string
--target-ext_dev-device TARGET_EXT_DEV_DEVICE
target ext_dev device string
--target-ext_dev-keys TARGET_EXT_DEV_KEYS
target ext_dev keys options
target llvm:
--target-llvm-fast-math TARGET_LLVM_FAST_MATH
target llvm fast-math
--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
target llvm opt-level
--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
target llvm unpacked-api
--target-llvm-from_device TARGET_LLVM_FROM_DEVICE
target llvm from_device
--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
target llvm fast-math-ninf
--target-llvm-mattr TARGET_LLVM_MATTR
target llvm mattr options
--target-llvm-num-cores TARGET_LLVM_NUM_CORES
target llvm num-cores
--target-llvm-libs TARGET_LLVM_LIBS
target llvm libs options
--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
target llvm fast-math-nsz
--target-llvm-link-params TARGET_LLVM_LINK_PARAMS
target llvm link-params
--target-llvm-interface-api TARGET_LLVM_INTERFACE_API
target llvm interface-api string
--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
target llvm fast-math-contract
--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
target llvm system-lib
--target-llvm-tag TARGET_LLVM_TAG
target llvm tag string
--target-llvm-mtriple TARGET_LLVM_MTRIPLE
target llvm mtriple string
--target-llvm-model TARGET_LLVM_MODEL
target llvm model string
--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
target llvm mfloat-abi string
--target-llvm-mcpu TARGET_LLVM_MCPU
target llvm mcpu string
--target-llvm-device TARGET_LLVM_DEVICE
target llvm device string
--target-llvm-runtime TARGET_LLVM_RUNTIME
target llvm runtime string
--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
target llvm fast-math-arcp
--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
target llvm fast-math-reassoc
--target-llvm-mabi TARGET_LLVM_MABI
target llvm mabi string
--target-llvm-keys TARGET_LLVM_KEYS
target llvm keys options
--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
target llvm fast-math-nnan
target hybrid:
--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
target hybrid from_device
--target-hybrid-libs TARGET_HYBRID_LIBS
target hybrid libs options
--target-hybrid-model TARGET_HYBRID_MODEL
target hybrid model string
--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
target hybrid system-lib
--target-hybrid-tag TARGET_HYBRID_TAG
target hybrid tag string
--target-hybrid-device TARGET_HYBRID_DEVICE
target hybrid device string
--target-hybrid-keys TARGET_HYBRID_KEYS
target hybrid keys options
target aocl:
--target-aocl-from_device TARGET_AOCL_FROM_DEVICE
target aocl from_device
--target-aocl-libs TARGET_AOCL_LIBS
target aocl libs options
--target-aocl-model TARGET_AOCL_MODEL
target aocl model string
--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
target aocl system-lib
--target-aocl-tag TARGET_AOCL_TAG
target aocl tag string
--target-aocl-device TARGET_AOCL_DEVICE
target aocl device string
--target-aocl-keys TARGET_AOCL_KEYS
target aocl keys options
target nvptx:
--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
target nvptx max_num_threads
--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
target nvptx thread_warp_size
--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
target nvptx from_device
--target-nvptx-libs TARGET_NVPTX_LIBS
target nvptx libs options
--target-nvptx-model TARGET_NVPTX_MODEL
target nvptx model string
--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
target nvptx system-lib
--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
target nvptx mtriple string
--target-nvptx-tag TARGET_NVPTX_TAG
target nvptx tag string
--target-nvptx-mcpu TARGET_NVPTX_MCPU
target nvptx mcpu string
--target-nvptx-device TARGET_NVPTX_DEVICE
target nvptx device string
--target-nvptx-keys TARGET_NVPTX_KEYS
target nvptx keys options
target opencl:
--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
target opencl max_num_threads
--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
target opencl thread_warp_size
--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
target opencl from_device
--target-opencl-libs TARGET_OPENCL_LIBS
target opencl libs options
--target-opencl-model TARGET_OPENCL_MODEL
target opencl model string
--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
target opencl system-lib
--target-opencl-tag TARGET_OPENCL_TAG
target opencl tag string
--target-opencl-device TARGET_OPENCL_DEVICE
target opencl device string
--target-opencl-keys TARGET_OPENCL_KEYS
target opencl keys options
target metal:
--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
target metal max_num_threads
--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
target metal thread_warp_size
--target-metal-from_device TARGET_METAL_FROM_DEVICE
target metal from_device
--target-metal-libs TARGET_METAL_LIBS
target metal libs options
--target-metal-keys TARGET_METAL_KEYS
target metal keys options
--target-metal-model TARGET_METAL_MODEL
target metal model string
--target-metal-system-lib TARGET_METAL_SYSTEM_LIB
target metal system-lib
--target-metal-tag TARGET_METAL_TAG
target metal tag string
--target-metal-device TARGET_METAL_DEVICE
target metal device string
--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
target metal max_function_args
target webgpu:
--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
target webgpu max_num_threads
--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
target webgpu from_device
--target-webgpu-libs TARGET_WEBGPU_LIBS
target webgpu libs options
--target-webgpu-model TARGET_WEBGPU_MODEL
target webgpu model string
--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
target webgpu system-lib
--target-webgpu-tag TARGET_WEBGPU_TAG
target webgpu tag string
--target-webgpu-device TARGET_WEBGPU_DEVICE
target webgpu device string
--target-webgpu-keys TARGET_WEBGPU_KEYS
target webgpu keys options
target rocm:
--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
target rocm max_num_threads
--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
target rocm thread_warp_size
--target-rocm-from_device TARGET_ROCM_FROM_DEVICE
target rocm from_device
--target-rocm-libs TARGET_ROCM_LIBS
target rocm libs options
--target-rocm-mattr TARGET_ROCM_MATTR
target rocm mattr options
--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK
target rocm max_shared_memory_per_block
--target-rocm-model TARGET_ROCM_MODEL
target rocm model string
--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
target rocm system-lib
--target-rocm-mtriple TARGET_ROCM_MTRIPLE
target rocm mtriple string
--target-rocm-tag TARGET_ROCM_TAG
target rocm tag string
--target-rocm-device TARGET_ROCM_DEVICE
target rocm device string
--target-rocm-mcpu TARGET_ROCM_MCPU
target rocm mcpu string
--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK
target rocm max_threads_per_block
--target-rocm-keys TARGET_ROCM_KEYS
target rocm keys options
target vulkan:
--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
target vulkan max_num_threads
--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
target vulkan thread_warp_size
--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
target vulkan from_device
--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
target vulkan max_per_stage_descriptor_storage_buffer
--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
target vulkan driver_version
--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
target vulkan supports_16bit_buffer
--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
target vulkan max_block_size_z
--target-vulkan-libs TARGET_VULKAN_LIBS
target vulkan libs options
--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
target vulkan supports_dedicated_allocation
--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
target vulkan supported_subgroup_operations
--target-vulkan-mattr TARGET_VULKAN_MATTR
target vulkan mattr options
--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
target vulkan max_storage_buffer_range
--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
target vulkan max_push_constants_size
--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
target vulkan supports_push_descriptor
--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
target vulkan supports_int64
--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
target vulkan supports_float32
--target-vulkan-model TARGET_VULKAN_MODEL
target vulkan model string
--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
target vulkan max_block_size_x
--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
target vulkan system-lib
--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
target vulkan max_block_size_y
--target-vulkan-tag TARGET_VULKAN_TAG
target vulkan tag string
--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
target vulkan supports_int8
--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
target vulkan max_spirv_version
--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
target vulkan vulkan_api_version
--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
target vulkan supports_8bit_buffer
--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
target vulkan device_type string
--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
target vulkan supports_int32
--target-vulkan-device TARGET_VULKAN_DEVICE
target vulkan device string
--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK
target vulkan max_threads_per_block
--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
target vulkan max_uniform_buffer_range
--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
target vulkan driver_name string
--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT
target vulkan supports_integer_dot_product
--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
target vulkan supports_storage_buffer_storage_class
--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
target vulkan supports_float16
--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
target vulkan device_name string
--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
target vulkan supports_float64
--target-vulkan-keys TARGET_VULKAN_KEYS
target vulkan keys options
--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
target vulkan max_shared_memory_per_block
--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
target vulkan supports_int16
target cuda:
--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
target cuda max_num_threads
--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
target cuda thread_warp_size
--target-cuda-from_device TARGET_CUDA_FROM_DEVICE
target cuda from_device
--target-cuda-arch TARGET_CUDA_ARCH
target cuda arch string
--target-cuda-libs TARGET_CUDA_LIBS
target cuda libs options
--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK
target cuda max_shared_memory_per_block
--target-cuda-model TARGET_CUDA_MODEL
target cuda model string
--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
target cuda system-lib
--target-cuda-tag TARGET_CUDA_TAG
target cuda tag string
--target-cuda-device TARGET_CUDA_DEVICE
target cuda device string
--target-cuda-mcpu TARGET_CUDA_MCPU
target cuda mcpu string
--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
target cuda max_threads_per_block
--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
target cuda registers_per_block
--target-cuda-keys TARGET_CUDA_KEYS
target cuda keys options
target sdaccel:
--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
target sdaccel from_device
--target-sdaccel-libs TARGET_SDACCEL_LIBS
target sdaccel libs options
--target-sdaccel-model TARGET_SDACCEL_MODEL
target sdaccel model string
--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
target sdaccel system-lib
--target-sdaccel-tag TARGET_SDACCEL_TAG
target sdaccel tag string
--target-sdaccel-device TARGET_SDACCEL_DEVICE
target sdaccel device string
--target-sdaccel-keys TARGET_SDACCEL_KEYS
target sdaccel keys options
target composite:
--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
target composite from_device
--target-composite-libs TARGET_COMPOSITE_LIBS
target composite libs options
--target-composite-devices TARGET_COMPOSITE_DEVICES
target composite devices options
--target-composite-model TARGET_COMPOSITE_MODEL
target composite model string
--target-composite-tag TARGET_COMPOSITE_TAG
target composite tag string
--target-composite-device TARGET_COMPOSITE_DEVICE
target composite device string
--target-composite-keys TARGET_COMPOSITE_KEYS
target composite keys options
target stackvm:
--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
target stackvm from_device
--target-stackvm-libs TARGET_STACKVM_LIBS
target stackvm libs options
--target-stackvm-model TARGET_STACKVM_MODEL
target stackvm model string
--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
target stackvm system-lib
--target-stackvm-tag TARGET_STACKVM_TAG
target stackvm tag string
--target-stackvm-device TARGET_STACKVM_DEVICE
target stackvm device string
--target-stackvm-keys TARGET_STACKVM_KEYS
target stackvm keys options
target aocl_sw_emu:
--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
target aocl_sw_emu from_device
--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
target aocl_sw_emu libs options
--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
target aocl_sw_emu model string
--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
target aocl_sw_emu system-lib
--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
target aocl_sw_emu tag string
--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
target aocl_sw_emu device string
--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
target aocl_sw_emu keys options
target c:
--target-c-unpacked-api TARGET_C_UNPACKED_API
target c unpacked-api
--target-c-from_device TARGET_C_FROM_DEVICE
target c from_device
--target-c-libs TARGET_C_LIBS
target c libs options
--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
target c constants-byte-alignment
--target-c-executor TARGET_C_EXECUTOR
target c executor string
--target-c-link-params TARGET_C_LINK_PARAMS
target c link-params
--target-c-model TARGET_C_MODEL
target c model string
--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
target c workspace-byte-alignment
--target-c-system-lib TARGET_C_SYSTEM_LIB
target c system-lib
--target-c-tag TARGET_C_TAG
target c tag string
--target-c-interface-api TARGET_C_INTERFACE_API
target c interface-api string
--target-c-mcpu TARGET_C_MCPU
target c mcpu string
--target-c-device TARGET_C_DEVICE
target c device string
--target-c-runtime TARGET_C_RUNTIME
target c runtime string
--target-c-keys TARGET_C_KEYS
target c keys options
--target-c-march TARGET_C_MARCH
target c march string
target hexagon:
--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
target hexagon from_device
--target-hexagon-libs TARGET_HEXAGON_LIBS
target hexagon libs options
--target-hexagon-mattr TARGET_HEXAGON_MATTR
target hexagon mattr options
--target-hexagon-model TARGET_HEXAGON_MODEL
target hexagon model string
--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
target hexagon llvm-options options
--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
target hexagon mtriple string
--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
target hexagon system-lib
--target-hexagon-mcpu TARGET_HEXAGON_MCPU
target hexagon mcpu string
--target-hexagon-device TARGET_HEXAGON_DEVICE
target hexagon device string
--target-hexagon-tag TARGET_HEXAGON_TAG
target hexagon tag string
--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
target hexagon link-params
--target-hexagon-keys TARGET_HEXAGON_KEYS
target hexagon keys options
executor graph:
--executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS
Executor graph link-params
executor aot:
--executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT
Executor aot workspace-byte-alignment
--executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API
Executor aot unpacked-api
--executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API
Executor aot interface-api string
--executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS
Executor aot link-params
runtime cpp:
--runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB
Runtime cpp system-lib
runtime crt:
--runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB
Runtime crt system-lib
为 TVM 添加 ONNX 支持
TVM 依赖于你系统中的 ONNX python 库。你可以使用 pip3 install --user onnx onnxoptimizer
命令来安装 ONNX。如果你有 root 权限并且想全局安装 ONNX,你可以去掉 --user
选项。对 onnxoptimizer
的依赖是可选的,仅用于 onnx>=1.9
。
将 ONNX 模型编译到 TVM 运行时中#
一旦下载了 ResNet-50 模型,下一步就是对其进行编译。为了达到这个目的,将使用 tvmc compile
。从编译过程中得到的输出是模型的 TAR 包,它被编译成目标平台的动态库。可以使用 TVM 运行时在目标设备上运行该模型。
# 这可能需要几分钟的时间,取决于你的机器
!python -m tvm.driver.tvmc compile --target "llvm" \
--output resnet50-v2-7-tvm.tar \
../../_models/resnet50-v2-7.onnx
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
查看 tvmc compile
在 module 中创建的文件:
%%bash
mkdir model
tar -xvf resnet50-v2-7-tvm.tar -C model
mod.so
mod.json
mod.params
列出了三个文件:
mod.so
是模型,表示为 C++ 库,可以被 TVM 运行时加载。mod.json
是 TVM Relay 计算图的文本表示。mod.params
是包含预训练模型参数的文件。
该 module 可以被你的应用程序直接加载,而 model 可以通过 TVM 运行时 API 运行。
定义正确的 target
指定正确的目标(选项 --target
)可以对编译后的模块的性能产生巨大的影响,因为它可以利用目标上可用的硬件特性。
欲了解更多信息,请参考 为 x86 CPU 自动调优卷积网络。建议确定你运行的是哪种 CPU,以及可选的功能,并适当地设置目标。
用 TVMC 从编译的模块中运行模型#
已经将模型编译到模块,可以使用 TVM 运行时来进行预测。
TVMC 内置了 TVM 运行时,允许你运行编译的 TVM 模型。为了使用 TVMC 来运行模型并进行预测,需要两样东西:
编译后的模块,我们刚刚生成出来。
对模型的有效输入,以进行预测。
当涉及到预期的张量形状、格式和数据类型时,每个模型都很特别。出于这个原因,大多数模型需要一些预处理和后处理,以确保输入是有效的,并解释输出结果。TVMC 对输入和输出数据都采用了 NumPy 的 .npz
格式。这是得到良好支持的 NumPy 格式,可以将多个数组序列化为文件。
作为本教程的输入,将使用一只猫的图像,但你可以自由地用你选择的任何图像来代替这个图像。
输入预处理#
对于 ResNet-50 v2 模型,预期输入是 ImageNet 格式的。下面是为 ResNet-50 v2 预处理图像的脚本例子。
你将需要安装支持的 Python 图像库的版本。你可以使用 pip3 install --user pillow
来满足脚本的这个要求。
#!python ./preprocess.py
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np
img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")
# ONNX expects NCHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))
# Normalize according to ImageNet
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
# Add batch dimension
img_data = np.expand_dims(norm_img_data, axis=0)
# Save to .npz (outputs imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)
运行已编译的模块#
有了模型和输入数据,现在可以运行 TVMC 来做预测:
!python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm.tar
回顾一下, .tar
模型文件包括 C++ 库,对 Relay 模型的描述,以及模型的参数。TVMC 包括 TVM 运行时,它可以加载模型并根据输入进行预测。当运行上述命令时,TVMC 会输出新文件,predictions.npz
,其中包含 NumPy 格式的模型输出张量。
在这个例子中,在用于编译的同一台机器上运行该模型。在某些情况下,可能想通过 RPC Tracker 远程运行它。要阅读更多关于这些选项的信息,请查看:
!python -m tvm.driver.tvmc run --help
usage: tvmc run [-h] [--device {cpu,cuda,cl,metal,vulkan,rocm,micro}]
[--fill-mode {zeros,ones,random}] [-i INPUTS] [-o OUTPUTS]
[--print-time] [--print-top N] [--profile] [--end-to-end]
[--repeat N] [--number N] [--rpc-key RPC_KEY]
[--rpc-tracker RPC_TRACKER] [--list-options]
PATH
positional arguments:
PATH path to the compiled module file or to the project
directory if '--device micro' is selected.
optional arguments:
-h, --help show this help message and exit
--device {cpu,cuda,cl,metal,vulkan,rocm,micro}
target device to run the compiled module. Defaults to
'cpu'
--fill-mode {zeros,ones,random}
fill all input tensors with values. In case
--inputs/-i is provided, they will take precedence
over --fill-mode. Any remaining inputs will be filled
using the chosen fill mode. Defaults to 'random'
-i INPUTS, --inputs INPUTS
path to the .npz input file
-o OUTPUTS, --outputs OUTPUTS
path to the .npz output file
--print-time record and print the execution time(s). (non-micro
devices only)
--print-top N print the top n values and indices of the output
tensor
--profile generate profiling data from the runtime execution.
Using --profile requires the Graph Executor Debug
enabled on TVM. Profiling may also have an impact on
inference time, making it take longer to be generated.
(non-micro devices only)
--end-to-end Measure data transfers as well as model execution.
This can provide a more realistic performance
measurement in many cases.
--repeat N run the model n times. Defaults to '1'
--number N repeat the run n times. Defaults to '1'
--rpc-key RPC_KEY the RPC tracker key of the target device. (non-micro
devices only)
--rpc-tracker RPC_TRACKER
hostname (required) and port (optional, defaults to
9090) of the RPC tracker, e.g. '192.168.0.100:9999'.
(non-micro devices only)
--list-options show all run options and option choices when '--device
micro' is selected. (micro devices only)
输出后处理#
如前所述,每个模型都会有自己的特定方式来提供输出张量。
需要运行一些后处理,利用为模型提供的查找表,将 ResNet-50 v2 的输出渲染成人类可读的形式。
下面的脚本显示了后处理的例子,从编译的模块的输出中提取标签。
运行这个脚本应该产生以下输出:
#!python ./postprocess.py
import os.path
import numpy as np
from scipy.special import softmax
from tvm.contrib.download import download_testdata
# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")
with open(labels_path, "r") as f:
labels = [l.rstrip() for l in f]
output_file = "predictions.npz"
# Open the output and read the output tensor
if os.path.exists(output_file):
with np.load(output_file) as data:
scores = softmax(data["output_0"])
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262
试着用其他图像替换猫的图像,看看 ResNet 模型会做出什么样的预测。
自动调优 ResNet 模型#
之前的模型是为了在 TVM 运行时工作而编译的,但不包括任何特定平台的优化。在本节中,将展示如何使用 TVMC 建立针对你工作平台的优化模型。
在某些情况下,当使用编译模块运行推理时,可能无法获得预期的性能。在这种情况下,可以利用自动调优器,为模型找到更好的配置,获得性能的提升。TVM 中的调优是指对模型进行优化以在给定目标上更快地运行的过程。这与训练或微调不同,因为它不影响模型的准确性,而只影响运行时的性能。作为调优过程的一部分,TVM 将尝试运行许多不同的运算器实现变体,以观察哪些算子表现最佳。这些运行的结果被存储在调优记录文件中,这最终是 tune
子命令的输出。
在最简单的形式下,调优要求你提供三样东西:
你打算在这个模型上运行的设备的目标规格
输出文件的路径,调优记录将被保存在该文件中
最后是要调优的模型的路径。
默认搜索算法需要 xgboost
,请参阅下面关于优化搜索算法的详细信息:
pip install xgboost cloudpickle
下面的例子展示了这一做法的实际效果:
!python -m tvm.driver.tvmc tune --target "llvm" \
--output resnet50-v2-7-autotuner_records.json \
../../_models/resnet50-v2-7.onnx
/media/pc/data/4tb/lxw/anaconda3/envs/mx39/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
[Task 1/25] Current/Best: 139.87/ 252.51 GFLOPS | Progress: (40/40) | 20.88 s Done.
[Task 2/25] Current/Best: 42.44/ 183.76 GFLOPS | Progress: (40/40) | 11.12 s Done.
[Task 3/25] Current/Best: 176.21/ 215.65 GFLOPS | Progress: (40/40) | 11.55 s Done.
[Task 4/25] Current/Best: 113.94/ 160.83 GFLOPS | Progress: (40/40) | 13.36 s Done.
[Task 5/25] Current/Best: 120.38/ 164.05 GFLOPS | Progress: (40/40) | 12.15 s Done.
[Task 6/25] Current/Best: 103.44/ 188.69 GFLOPS | Progress: (40/40) | 12.60 s Done.
[Task 7/25] Current/Best: 137.09/ 204.00 GFLOPS | Progress: (40/40) | 11.36 s Done.
[Task 8/25] Current/Best: 99.24/ 195.34 GFLOPS | Progress: (40/40) | 18.87 s Done.
[Task 9/25] Current/Best: 70.21/ 189.30 GFLOPS | Progress: (40/40) | 19.84 s Done.
[Task 10/25] Current/Best: 139.57/ 150.27 GFLOPS | Progress: (40/40) | 11.81 s Done.
[Task 11/25] Current/Best: 136.51/ 192.55 GFLOPS | Progress: (40/40) | 11.38 s Done.
[Task 12/25] Current/Best: 127.62/ 216.62 GFLOPS | Progress: (40/40) | 15.05 s Done.
[Task 13/25] Current/Best: 76.30/ 237.37 GFLOPS | Progress: (40/40) | 12.29 s Done.
[Task 14/25] Current/Best: 67.69/ 197.50 GFLOPS | Progress: (40/40) | 17.04 s Done.
[Task 16/25] Current/Best: 57.91/ 200.78 GFLOPS | Progress: (40/40) | 12.76 s Done.
[Task 17/25] Current/Best: 172.88/ 267.60 GFLOPS | Progress: (40/40) | 12.21 s Done.
[Task 18/25] Current/Best: 164.30/ 195.15 GFLOPS | Progress: (40/40) | 18.82 s Done.
[Task 19/25] Current/Best: 122.30/ 209.99 GFLOPS | Progress: (40/40) | 14.50 s Done.
[Task 22/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/40) | 0.00 s s Done.
Done.
Done.
[Task 22/25] Current/Best: 69.31/ 177.25 GFLOPS | Progress: (40/40) | 12.39 s Done.
[Task 23/25] Current/Best: 92.92/ 185.29 GFLOPS | Progress: (40/40) | 13.99 s Done.
[Task 25/25] Current/Best: 18.40/ 84.62 GFLOPS | Progress: (40/40) | 20.26 s Done.
Done.
在这个例子中,如果你为 --target
标志指出更具体的目标,你会看到更好的结果。
TVMC 将对模型的参数空间进行搜索,尝试不同的运算符配置,并选择在你的平台上运行最快的一个。尽管这是基于 CPU 和模型操作的指导性搜索,但仍可能需要几个小时来完成搜索。这个搜索的输出将被保存到 resnet50-v2-7-autotuner_records.json
文件中,以后将被用来编译优化的模型。
定义调优搜索算法
默认情况下,这种搜索是使用 XGBoost Grid
算法引导的。根据你的模型的复杂性和可利用的时间,你可能想选择不同的算法。完整的列表可以通过查阅:
!python -m tvm.driver.tvmc tune --help
usage: tvmc tune [-h] [--early-stopping EARLY_STOPPING]
[--min-repeat-ms MIN_REPEAT_MS]
[--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
[--number NUMBER] -o OUTPUT [--parallel PARALLEL]
[--repeat REPEAT] [--rpc-key RPC_KEY]
[--rpc-tracker RPC_TRACKER] [--target TARGET]
[--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
[--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
[--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
[--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
[--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
[--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
[--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
[--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
[--target-ext_dev-model TARGET_EXT_DEV_MODEL]
[--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
[--target-ext_dev-tag TARGET_EXT_DEV_TAG]
[--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
[--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
[--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
[--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
[--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
[--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
[--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
[--target-llvm-mattr TARGET_LLVM_MATTR]
[--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
[--target-llvm-libs TARGET_LLVM_LIBS]
[--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
[--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
[--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
[--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
[--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
[--target-llvm-tag TARGET_LLVM_TAG]
[--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
[--target-llvm-model TARGET_LLVM_MODEL]
[--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
[--target-llvm-mcpu TARGET_LLVM_MCPU]
[--target-llvm-device TARGET_LLVM_DEVICE]
[--target-llvm-runtime TARGET_LLVM_RUNTIME]
[--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
[--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
[--target-llvm-mabi TARGET_LLVM_MABI]
[--target-llvm-keys TARGET_LLVM_KEYS]
[--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
[--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
[--target-hybrid-libs TARGET_HYBRID_LIBS]
[--target-hybrid-model TARGET_HYBRID_MODEL]
[--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
[--target-hybrid-tag TARGET_HYBRID_TAG]
[--target-hybrid-device TARGET_HYBRID_DEVICE]
[--target-hybrid-keys TARGET_HYBRID_KEYS]
[--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
[--target-aocl-libs TARGET_AOCL_LIBS]
[--target-aocl-model TARGET_AOCL_MODEL]
[--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
[--target-aocl-tag TARGET_AOCL_TAG]
[--target-aocl-device TARGET_AOCL_DEVICE]
[--target-aocl-keys TARGET_AOCL_KEYS]
[--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
[--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
[--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
[--target-nvptx-libs TARGET_NVPTX_LIBS]
[--target-nvptx-model TARGET_NVPTX_MODEL]
[--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
[--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
[--target-nvptx-tag TARGET_NVPTX_TAG]
[--target-nvptx-mcpu TARGET_NVPTX_MCPU]
[--target-nvptx-device TARGET_NVPTX_DEVICE]
[--target-nvptx-keys TARGET_NVPTX_KEYS]
[--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
[--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
[--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
[--target-opencl-libs TARGET_OPENCL_LIBS]
[--target-opencl-model TARGET_OPENCL_MODEL]
[--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
[--target-opencl-tag TARGET_OPENCL_TAG]
[--target-opencl-device TARGET_OPENCL_DEVICE]
[--target-opencl-keys TARGET_OPENCL_KEYS]
[--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
[--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
[--target-metal-from_device TARGET_METAL_FROM_DEVICE]
[--target-metal-libs TARGET_METAL_LIBS]
[--target-metal-keys TARGET_METAL_KEYS]
[--target-metal-model TARGET_METAL_MODEL]
[--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
[--target-metal-tag TARGET_METAL_TAG]
[--target-metal-device TARGET_METAL_DEVICE]
[--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
[--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
[--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
[--target-webgpu-libs TARGET_WEBGPU_LIBS]
[--target-webgpu-model TARGET_WEBGPU_MODEL]
[--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
[--target-webgpu-tag TARGET_WEBGPU_TAG]
[--target-webgpu-device TARGET_WEBGPU_DEVICE]
[--target-webgpu-keys TARGET_WEBGPU_KEYS]
[--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
[--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
[--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
[--target-rocm-libs TARGET_ROCM_LIBS]
[--target-rocm-mattr TARGET_ROCM_MATTR]
[--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-rocm-model TARGET_ROCM_MODEL]
[--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
[--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
[--target-rocm-tag TARGET_ROCM_TAG]
[--target-rocm-device TARGET_ROCM_DEVICE]
[--target-rocm-mcpu TARGET_ROCM_MCPU]
[--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK]
[--target-rocm-keys TARGET_ROCM_KEYS]
[--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
[--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
[--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
[--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
[--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
[--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
[--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
[--target-vulkan-libs TARGET_VULKAN_LIBS]
[--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
[--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
[--target-vulkan-mattr TARGET_VULKAN_MATTR]
[--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
[--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
[--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
[--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
[--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
[--target-vulkan-model TARGET_VULKAN_MODEL]
[--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
[--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
[--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
[--target-vulkan-tag TARGET_VULKAN_TAG]
[--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
[--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
[--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
[--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
[--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
[--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
[--target-vulkan-device TARGET_VULKAN_DEVICE]
[--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK]
[--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
[--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
[--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT]
[--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
[--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
[--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
[--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
[--target-vulkan-keys TARGET_VULKAN_KEYS]
[--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
[--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
[--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
[--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
[--target-cuda-arch TARGET_CUDA_ARCH]
[--target-cuda-libs TARGET_CUDA_LIBS]
[--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-cuda-model TARGET_CUDA_MODEL]
[--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
[--target-cuda-tag TARGET_CUDA_TAG]
[--target-cuda-device TARGET_CUDA_DEVICE]
[--target-cuda-mcpu TARGET_CUDA_MCPU]
[--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
[--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
[--target-cuda-keys TARGET_CUDA_KEYS]
[--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
[--target-sdaccel-libs TARGET_SDACCEL_LIBS]
[--target-sdaccel-model TARGET_SDACCEL_MODEL]
[--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
[--target-sdaccel-tag TARGET_SDACCEL_TAG]
[--target-sdaccel-device TARGET_SDACCEL_DEVICE]
[--target-sdaccel-keys TARGET_SDACCEL_KEYS]
[--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
[--target-composite-libs TARGET_COMPOSITE_LIBS]
[--target-composite-devices TARGET_COMPOSITE_DEVICES]
[--target-composite-model TARGET_COMPOSITE_MODEL]
[--target-composite-tag TARGET_COMPOSITE_TAG]
[--target-composite-device TARGET_COMPOSITE_DEVICE]
[--target-composite-keys TARGET_COMPOSITE_KEYS]
[--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
[--target-stackvm-libs TARGET_STACKVM_LIBS]
[--target-stackvm-model TARGET_STACKVM_MODEL]
[--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
[--target-stackvm-tag TARGET_STACKVM_TAG]
[--target-stackvm-device TARGET_STACKVM_DEVICE]
[--target-stackvm-keys TARGET_STACKVM_KEYS]
[--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
[--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
[--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
[--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
[--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
[--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
[--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
[--target-c-unpacked-api TARGET_C_UNPACKED_API]
[--target-c-from_device TARGET_C_FROM_DEVICE]
[--target-c-libs TARGET_C_LIBS]
[--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
[--target-c-executor TARGET_C_EXECUTOR]
[--target-c-link-params TARGET_C_LINK_PARAMS]
[--target-c-model TARGET_C_MODEL]
[--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
[--target-c-system-lib TARGET_C_SYSTEM_LIB]
[--target-c-tag TARGET_C_TAG]
[--target-c-interface-api TARGET_C_INTERFACE_API]
[--target-c-mcpu TARGET_C_MCPU]
[--target-c-device TARGET_C_DEVICE]
[--target-c-runtime TARGET_C_RUNTIME]
[--target-c-keys TARGET_C_KEYS]
[--target-c-march TARGET_C_MARCH]
[--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
[--target-hexagon-libs TARGET_HEXAGON_LIBS]
[--target-hexagon-mattr TARGET_HEXAGON_MATTR]
[--target-hexagon-model TARGET_HEXAGON_MODEL]
[--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
[--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
[--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
[--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
[--target-hexagon-device TARGET_HEXAGON_DEVICE]
[--target-hexagon-tag TARGET_HEXAGON_TAG]
[--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
[--target-hexagon-keys TARGET_HEXAGON_KEYS]
[--target-host TARGET_HOST] [--timeout TIMEOUT]
[--trials TRIALS] [--tuning-records PATH]
[--desired-layout {NCHW,NHWC}] [--enable-autoscheduler]
[--cache-line-bytes CACHE_LINE_BYTES] [--num-cores NUM_CORES]
[--vector-unit-bytes VECTOR_UNIT_BYTES]
[--max-shared-memory-per-block MAX_SHARED_MEMORY_PER_BLOCK]
[--max-local-memory-per-block MAX_LOCAL_MEMORY_PER_BLOCK]
[--max-threads-per-block MAX_THREADS_PER_BLOCK]
[--max-vthread-extent MAX_VTHREAD_EXTENT]
[--warp-size WARP_SIZE] [--include-simple-tasks]
[--log-estimated-latency]
[--tuner {ga,gridsearch,random,xgb,xgb_knob,xgb-rank}]
[--input-shapes INPUT_SHAPES]
FILE
positional arguments:
FILE path to the input model file
optional arguments:
-h, --help show this help message and exit
--early-stopping EARLY_STOPPING
minimum number of trials before early stopping
--min-repeat-ms MIN_REPEAT_MS
minimum time to run each trial, in milliseconds.
Defaults to 0 on x86 and 1000 on all other targets
--model-format {keras,onnx,pb,tflite,pytorch,paddle}
specify input model format
--number NUMBER number of runs a single repeat is made of. The final
number of tuning executions is: (1 + number * repeat)
-o OUTPUT, --output OUTPUT
output file to store the tuning records for the tuning
process
--parallel PARALLEL the maximum number of parallel devices to use when
tuning
--repeat REPEAT how many times to repeat each measurement
--rpc-key RPC_KEY the RPC tracker key of the target device. Required
when --rpc-tracker is provided.
--rpc-tracker RPC_TRACKER
hostname (required) and port (optional, defaults to
9090) of the RPC tracker, e.g. '192.168.0.100:9999'
--target TARGET compilation target as plain string, inline JSON or
path to a JSON file
--target-host TARGET_HOST
the host compilation target, defaults to None
--timeout TIMEOUT compilation timeout, in seconds
--trials TRIALS the maximum number of tuning trials to perform
--tuning-records PATH
path to an auto-tuning log file by AutoTVM.
--desired-layout {NCHW,NHWC}
change the data layout of the whole graph
--enable-autoscheduler
enable tuning the graph through the AutoScheduler
tuner
--input-shapes INPUT_SHAPES
specify non-generic shapes for model to run, format is
"input_name:[dim1,dim2,...,dimn]
input_name2:[dim1,dim2]"
target example_target_hook:
--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
target example_target_hook from_device
--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
target example_target_hook libs options
--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
target example_target_hook model string
--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
target example_target_hook tag string
--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
target example_target_hook device string
--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
target example_target_hook keys options
target ext_dev:
--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
target ext_dev from_device
--target-ext_dev-libs TARGET_EXT_DEV_LIBS
target ext_dev libs options
--target-ext_dev-model TARGET_EXT_DEV_MODEL
target ext_dev model string
--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
target ext_dev system-lib
--target-ext_dev-tag TARGET_EXT_DEV_TAG
target ext_dev tag string
--target-ext_dev-device TARGET_EXT_DEV_DEVICE
target ext_dev device string
--target-ext_dev-keys TARGET_EXT_DEV_KEYS
target ext_dev keys options
target llvm:
--target-llvm-fast-math TARGET_LLVM_FAST_MATH
target llvm fast-math
--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
target llvm opt-level
--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
target llvm unpacked-api
--target-llvm-from_device TARGET_LLVM_FROM_DEVICE
target llvm from_device
--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
target llvm fast-math-ninf
--target-llvm-mattr TARGET_LLVM_MATTR
target llvm mattr options
--target-llvm-num-cores TARGET_LLVM_NUM_CORES
target llvm num-cores
--target-llvm-libs TARGET_LLVM_LIBS
target llvm libs options
--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
target llvm fast-math-nsz
--target-llvm-link-params TARGET_LLVM_LINK_PARAMS
target llvm link-params
--target-llvm-interface-api TARGET_LLVM_INTERFACE_API
target llvm interface-api string
--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
target llvm fast-math-contract
--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
target llvm system-lib
--target-llvm-tag TARGET_LLVM_TAG
target llvm tag string
--target-llvm-mtriple TARGET_LLVM_MTRIPLE
target llvm mtriple string
--target-llvm-model TARGET_LLVM_MODEL
target llvm model string
--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
target llvm mfloat-abi string
--target-llvm-mcpu TARGET_LLVM_MCPU
target llvm mcpu string
--target-llvm-device TARGET_LLVM_DEVICE
target llvm device string
--target-llvm-runtime TARGET_LLVM_RUNTIME
target llvm runtime string
--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
target llvm fast-math-arcp
--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
target llvm fast-math-reassoc
--target-llvm-mabi TARGET_LLVM_MABI
target llvm mabi string
--target-llvm-keys TARGET_LLVM_KEYS
target llvm keys options
--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
target llvm fast-math-nnan
target hybrid:
--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
target hybrid from_device
--target-hybrid-libs TARGET_HYBRID_LIBS
target hybrid libs options
--target-hybrid-model TARGET_HYBRID_MODEL
target hybrid model string
--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
target hybrid system-lib
--target-hybrid-tag TARGET_HYBRID_TAG
target hybrid tag string
--target-hybrid-device TARGET_HYBRID_DEVICE
target hybrid device string
--target-hybrid-keys TARGET_HYBRID_KEYS
target hybrid keys options
target aocl:
--target-aocl-from_device TARGET_AOCL_FROM_DEVICE
target aocl from_device
--target-aocl-libs TARGET_AOCL_LIBS
target aocl libs options
--target-aocl-model TARGET_AOCL_MODEL
target aocl model string
--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
target aocl system-lib
--target-aocl-tag TARGET_AOCL_TAG
target aocl tag string
--target-aocl-device TARGET_AOCL_DEVICE
target aocl device string
--target-aocl-keys TARGET_AOCL_KEYS
target aocl keys options
target nvptx:
--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
target nvptx max_num_threads
--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
target nvptx thread_warp_size
--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
target nvptx from_device
--target-nvptx-libs TARGET_NVPTX_LIBS
target nvptx libs options
--target-nvptx-model TARGET_NVPTX_MODEL
target nvptx model string
--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
target nvptx system-lib
--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
target nvptx mtriple string
--target-nvptx-tag TARGET_NVPTX_TAG
target nvptx tag string
--target-nvptx-mcpu TARGET_NVPTX_MCPU
target nvptx mcpu string
--target-nvptx-device TARGET_NVPTX_DEVICE
target nvptx device string
--target-nvptx-keys TARGET_NVPTX_KEYS
target nvptx keys options
target opencl:
--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
target opencl max_num_threads
--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
target opencl thread_warp_size
--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
target opencl from_device
--target-opencl-libs TARGET_OPENCL_LIBS
target opencl libs options
--target-opencl-model TARGET_OPENCL_MODEL
target opencl model string
--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
target opencl system-lib
--target-opencl-tag TARGET_OPENCL_TAG
target opencl tag string
--target-opencl-device TARGET_OPENCL_DEVICE
target opencl device string
--target-opencl-keys TARGET_OPENCL_KEYS
target opencl keys options
target metal:
--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
target metal max_num_threads
--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
target metal thread_warp_size
--target-metal-from_device TARGET_METAL_FROM_DEVICE
target metal from_device
--target-metal-libs TARGET_METAL_LIBS
target metal libs options
--target-metal-keys TARGET_METAL_KEYS
target metal keys options
--target-metal-model TARGET_METAL_MODEL
target metal model string
--target-metal-system-lib TARGET_METAL_SYSTEM_LIB
target metal system-lib
--target-metal-tag TARGET_METAL_TAG
target metal tag string
--target-metal-device TARGET_METAL_DEVICE
target metal device string
--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
target metal max_function_args
target webgpu:
--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
target webgpu max_num_threads
--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
target webgpu from_device
--target-webgpu-libs TARGET_WEBGPU_LIBS
target webgpu libs options
--target-webgpu-model TARGET_WEBGPU_MODEL
target webgpu model string
--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
target webgpu system-lib
--target-webgpu-tag TARGET_WEBGPU_TAG
target webgpu tag string
--target-webgpu-device TARGET_WEBGPU_DEVICE
target webgpu device string
--target-webgpu-keys TARGET_WEBGPU_KEYS
target webgpu keys options
target rocm:
--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
target rocm max_num_threads
--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
target rocm thread_warp_size
--target-rocm-from_device TARGET_ROCM_FROM_DEVICE
target rocm from_device
--target-rocm-libs TARGET_ROCM_LIBS
target rocm libs options
--target-rocm-mattr TARGET_ROCM_MATTR
target rocm mattr options
--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK
target rocm max_shared_memory_per_block
--target-rocm-model TARGET_ROCM_MODEL
target rocm model string
--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
target rocm system-lib
--target-rocm-mtriple TARGET_ROCM_MTRIPLE
target rocm mtriple string
--target-rocm-tag TARGET_ROCM_TAG
target rocm tag string
--target-rocm-device TARGET_ROCM_DEVICE
target rocm device string
--target-rocm-mcpu TARGET_ROCM_MCPU
target rocm mcpu string
--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK
target rocm max_threads_per_block
--target-rocm-keys TARGET_ROCM_KEYS
target rocm keys options
target vulkan:
--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
target vulkan max_num_threads
--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
target vulkan thread_warp_size
--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
target vulkan from_device
--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
target vulkan max_per_stage_descriptor_storage_buffer
--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
target vulkan driver_version
--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
target vulkan supports_16bit_buffer
--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
target vulkan max_block_size_z
--target-vulkan-libs TARGET_VULKAN_LIBS
target vulkan libs options
--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
target vulkan supports_dedicated_allocation
--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
target vulkan supported_subgroup_operations
--target-vulkan-mattr TARGET_VULKAN_MATTR
target vulkan mattr options
--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
target vulkan max_storage_buffer_range
--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
target vulkan max_push_constants_size
--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
target vulkan supports_push_descriptor
--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
target vulkan supports_int64
--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
target vulkan supports_float32
--target-vulkan-model TARGET_VULKAN_MODEL
target vulkan model string
--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
target vulkan max_block_size_x
--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
target vulkan system-lib
--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
target vulkan max_block_size_y
--target-vulkan-tag TARGET_VULKAN_TAG
target vulkan tag string
--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
target vulkan supports_int8
--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
target vulkan max_spirv_version
--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
target vulkan vulkan_api_version
--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
target vulkan supports_8bit_buffer
--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
target vulkan device_type string
--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
target vulkan supports_int32
--target-vulkan-device TARGET_VULKAN_DEVICE
target vulkan device string
--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK
target vulkan max_threads_per_block
--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
target vulkan max_uniform_buffer_range
--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
target vulkan driver_name string
--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT
target vulkan supports_integer_dot_product
--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
target vulkan supports_storage_buffer_storage_class
--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
target vulkan supports_float16
--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
target vulkan device_name string
--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
target vulkan supports_float64
--target-vulkan-keys TARGET_VULKAN_KEYS
target vulkan keys options
--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
target vulkan max_shared_memory_per_block
--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
target vulkan supports_int16
target cuda:
--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
target cuda max_num_threads
--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
target cuda thread_warp_size
--target-cuda-from_device TARGET_CUDA_FROM_DEVICE
target cuda from_device
--target-cuda-arch TARGET_CUDA_ARCH
target cuda arch string
--target-cuda-libs TARGET_CUDA_LIBS
target cuda libs options
--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK
target cuda max_shared_memory_per_block
--target-cuda-model TARGET_CUDA_MODEL
target cuda model string
--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
target cuda system-lib
--target-cuda-tag TARGET_CUDA_TAG
target cuda tag string
--target-cuda-device TARGET_CUDA_DEVICE
target cuda device string
--target-cuda-mcpu TARGET_CUDA_MCPU
target cuda mcpu string
--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
target cuda max_threads_per_block
--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
target cuda registers_per_block
--target-cuda-keys TARGET_CUDA_KEYS
target cuda keys options
target sdaccel:
--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
target sdaccel from_device
--target-sdaccel-libs TARGET_SDACCEL_LIBS
target sdaccel libs options
--target-sdaccel-model TARGET_SDACCEL_MODEL
target sdaccel model string
--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
target sdaccel system-lib
--target-sdaccel-tag TARGET_SDACCEL_TAG
target sdaccel tag string
--target-sdaccel-device TARGET_SDACCEL_DEVICE
target sdaccel device string
--target-sdaccel-keys TARGET_SDACCEL_KEYS
target sdaccel keys options
target composite:
--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
target composite from_device
--target-composite-libs TARGET_COMPOSITE_LIBS
target composite libs options
--target-composite-devices TARGET_COMPOSITE_DEVICES
target composite devices options
--target-composite-model TARGET_COMPOSITE_MODEL
target composite model string
--target-composite-tag TARGET_COMPOSITE_TAG
target composite tag string
--target-composite-device TARGET_COMPOSITE_DEVICE
target composite device string
--target-composite-keys TARGET_COMPOSITE_KEYS
target composite keys options
target stackvm:
--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
target stackvm from_device
--target-stackvm-libs TARGET_STACKVM_LIBS
target stackvm libs options
--target-stackvm-model TARGET_STACKVM_MODEL
target stackvm model string
--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
target stackvm system-lib
--target-stackvm-tag TARGET_STACKVM_TAG
target stackvm tag string
--target-stackvm-device TARGET_STACKVM_DEVICE
target stackvm device string
--target-stackvm-keys TARGET_STACKVM_KEYS
target stackvm keys options
target aocl_sw_emu:
--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
target aocl_sw_emu from_device
--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
target aocl_sw_emu libs options
--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
target aocl_sw_emu model string
--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
target aocl_sw_emu system-lib
--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
target aocl_sw_emu tag string
--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
target aocl_sw_emu device string
--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
target aocl_sw_emu keys options
target c:
--target-c-unpacked-api TARGET_C_UNPACKED_API
target c unpacked-api
--target-c-from_device TARGET_C_FROM_DEVICE
target c from_device
--target-c-libs TARGET_C_LIBS
target c libs options
--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
target c constants-byte-alignment
--target-c-executor TARGET_C_EXECUTOR
target c executor string
--target-c-link-params TARGET_C_LINK_PARAMS
target c link-params
--target-c-model TARGET_C_MODEL
target c model string
--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
target c workspace-byte-alignment
--target-c-system-lib TARGET_C_SYSTEM_LIB
target c system-lib
--target-c-tag TARGET_C_TAG
target c tag string
--target-c-interface-api TARGET_C_INTERFACE_API
target c interface-api string
--target-c-mcpu TARGET_C_MCPU
target c mcpu string
--target-c-device TARGET_C_DEVICE
target c device string
--target-c-runtime TARGET_C_RUNTIME
target c runtime string
--target-c-keys TARGET_C_KEYS
target c keys options
--target-c-march TARGET_C_MARCH
target c march string
target hexagon:
--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
target hexagon from_device
--target-hexagon-libs TARGET_HEXAGON_LIBS
target hexagon libs options
--target-hexagon-mattr TARGET_HEXAGON_MATTR
target hexagon mattr options
--target-hexagon-model TARGET_HEXAGON_MODEL
target hexagon model string
--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
target hexagon llvm-options options
--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
target hexagon mtriple string
--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
target hexagon system-lib
--target-hexagon-mcpu TARGET_HEXAGON_MCPU
target hexagon mcpu string
--target-hexagon-device TARGET_HEXAGON_DEVICE
target hexagon device string
--target-hexagon-tag TARGET_HEXAGON_TAG
target hexagon tag string
--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
target hexagon link-params
--target-hexagon-keys TARGET_HEXAGON_KEYS
target hexagon keys options
AutoScheduler options:
AutoScheduler options, used when --enable-autoscheduler is provided
--cache-line-bytes CACHE_LINE_BYTES
the size of cache line in bytes. If not specified, it
will be autoset for the current machine.
--num-cores NUM_CORES
the number of device cores. If not specified, it will
be autoset for the current machine.
--vector-unit-bytes VECTOR_UNIT_BYTES
the width of vector units in bytes. If not specified,
it will be autoset for the current machine.
--max-shared-memory-per-block MAX_SHARED_MEMORY_PER_BLOCK
the max shared memory per block in bytes. If not
specified, it will be autoset for the current machine.
--max-local-memory-per-block MAX_LOCAL_MEMORY_PER_BLOCK
the max local memory per block in bytes. If not
specified, it will be autoset for the current machine.
--max-threads-per-block MAX_THREADS_PER_BLOCK
the max number of threads per block. If not specified,
it will be autoset for the current machine.
--max-vthread-extent MAX_VTHREAD_EXTENT
the max vthread extent. If not specified, it will be
autoset for the current machine.
--warp-size WARP_SIZE
the thread numbers of a warp. If not specified, it
will be autoset for the current machine.
--include-simple-tasks
whether to extract simple tasks that do not include
complicated ops
--log-estimated-latency
whether to log the estimated latency to the file after
tuning a task
AutoTVM options:
AutoTVM options, used when the AutoScheduler is not enabled
--tuner {ga,gridsearch,random,xgb,xgb_knob,xgb-rank}
type of tuner to use when tuning with autotvm.
对于消费级 Skylake CPU 来说,输出结果将是这样的:
!python -m tvm.driver.tvmc tune \
--target "llvm -mcpu=broadwell" \
--output resnet50-v2-7-autotuner_records.json \
../../_models/resnet50-v2-7.onnx
/media/pc/data/4tb/lxw/anaconda3/envs/mx39/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
[Task 1/25] Current/Best: 135.54/ 444.49 GFLOPS | Progress: (40/40) | 16.09 s Done.
[Task 2/25] Current/Best: 91.39/ 426.70 GFLOPS | Progress: (40/40) | 10.33 s Done.
[Task 3/25] Current/Best: 147.25/ 516.21 GFLOPS | Progress: (40/40) | 11.55 s Done.
[Task 4/25] Current/Best: 561.81/ 561.81 GFLOPS | Progress: (40/40) | 12.99 s Done.
[Task 5/25] Current/Best: 182.70/ 570.25 GFLOPS | Progress: (40/40) | 11.12 s Done.
[Task 6/25] Current/Best: 79.82/ 459.29 GFLOPS | Progress: (40/40) | 12.03 s Done.
[Task 7/25] Current/Best: 152.79/ 300.64 GFLOPS | Progress: (40/40) | 11.16 s Done.
[Task 8/25] Current/Best: 155.29/ 310.77 GFLOPS | Progress: (40/40) | 14.68 s Done.
[Task 9/25] Current/Best: 126.56/ 561.24 GFLOPS | Progress: (40/40) | 13.93 s Done.
[Task 10/25] Current/Best: 41.68/ 517.18 GFLOPS | Progress: (40/40) | 10.91 s Done.
[Task 11/25] Current/Best: 311.13/ 528.67 GFLOPS | Progress: (40/40) | 10.89 s Done.
[Task 12/25] Current/Best: 265.13/ 525.74 GFLOPS | Progress: (40/40) | 11.19 s Done.
[Task 13/25] Current/Best: 107.09/ 426.10 GFLOPS | Progress: (40/40) | 11.29 s Done.
[Task 14/25] Current/Best: 119.32/ 373.60 GFLOPS | Progress: (40/40) | 12.38 s Done.
[Task 15/25] Current/Best: 101.58/ 439.72 GFLOPS | Progress: (40/40) | 14.41 s Done.
[Task 16/25] Current/Best: 177.78/ 427.98 GFLOPS | Progress: (40/40) | 10.23 s Done.
[Task 17/25] Current/Best: 72.04/ 349.15 GFLOPS | Progress: (40/40) | 11.50 s Done.
[Task 18/25] Current/Best: 124.41/ 500.93 GFLOPS | Progress: (40/40) | 12.07 s Done.
[Task 19/25] Current/Best: 243.37/ 371.27 GFLOPS | Progress: (40/40) | 12.88 s Done.
[Task 20/25] Current/Best: 137.63/ 343.57 GFLOPS | Progress: (40/40) | 21.29 s Done.
[Task 21/25] Current/Best: 59.02/ 330.98 GFLOPS | Progress: (40/40) | 12.88 s Done.
[Task 22/25] Current/Best: 273.71/ 457.41 GFLOPS | Progress: (40/40) | 11.04 s Done.
[Task 23/25] Current/Best: 166.89/ 430.39 GFLOPS | Progress: (40/40) | 13.46 s Done.
[Task 25/25] Current/Best: 28.01/ 59.42 GFLOPS | Progress: (40/40) | 20.24 s Done.
Done.
调谐会话可能需要很长的时间,所以 tvmc tune
提供了许多选项来定制你的调谐过程,在重复次数方面(例如 --repeat
和 --number
),要使用的调谐算法等等。
用调优数据编译优化后的模型#
作为上述调谐过程的输出,获得了存储在 resnet50-v2-7-autotuner_records.json
的调谐记录。这个文件可以有两种使用方式:
作为进一步调谐的输入(通过
tvmc tune --tuning-records
)。作为对编译器的输入
编译器将使用这些结果来为你指定的目标上的模型生成高性能代码。要做到这一点,可以使用 tvmc compile --tuning-records
。
获得更多信息:
!python -m tvm.driver.tvmc compile --help
usage: tvmc compile [-h] [--cross-compiler CROSS_COMPILER]
[--cross-compiler-options CROSS_COMPILER_OPTIONS]
[--desired-layout {NCHW,NHWC}] [--dump-code FORMAT]
[--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
[-o OUTPUT] [-f {so,mlf}] [--pass-config name=value]
[--target TARGET]
[--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
[--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
[--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
[--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
[--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
[--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
[--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
[--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
[--target-ext_dev-model TARGET_EXT_DEV_MODEL]
[--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
[--target-ext_dev-tag TARGET_EXT_DEV_TAG]
[--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
[--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
[--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
[--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
[--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
[--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
[--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
[--target-llvm-mattr TARGET_LLVM_MATTR]
[--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
[--target-llvm-libs TARGET_LLVM_LIBS]
[--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
[--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
[--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
[--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
[--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
[--target-llvm-tag TARGET_LLVM_TAG]
[--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
[--target-llvm-model TARGET_LLVM_MODEL]
[--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
[--target-llvm-mcpu TARGET_LLVM_MCPU]
[--target-llvm-device TARGET_LLVM_DEVICE]
[--target-llvm-runtime TARGET_LLVM_RUNTIME]
[--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
[--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
[--target-llvm-mabi TARGET_LLVM_MABI]
[--target-llvm-keys TARGET_LLVM_KEYS]
[--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
[--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
[--target-hybrid-libs TARGET_HYBRID_LIBS]
[--target-hybrid-model TARGET_HYBRID_MODEL]
[--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
[--target-hybrid-tag TARGET_HYBRID_TAG]
[--target-hybrid-device TARGET_HYBRID_DEVICE]
[--target-hybrid-keys TARGET_HYBRID_KEYS]
[--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
[--target-aocl-libs TARGET_AOCL_LIBS]
[--target-aocl-model TARGET_AOCL_MODEL]
[--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
[--target-aocl-tag TARGET_AOCL_TAG]
[--target-aocl-device TARGET_AOCL_DEVICE]
[--target-aocl-keys TARGET_AOCL_KEYS]
[--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
[--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
[--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
[--target-nvptx-libs TARGET_NVPTX_LIBS]
[--target-nvptx-model TARGET_NVPTX_MODEL]
[--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
[--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
[--target-nvptx-tag TARGET_NVPTX_TAG]
[--target-nvptx-mcpu TARGET_NVPTX_MCPU]
[--target-nvptx-device TARGET_NVPTX_DEVICE]
[--target-nvptx-keys TARGET_NVPTX_KEYS]
[--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
[--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
[--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
[--target-opencl-libs TARGET_OPENCL_LIBS]
[--target-opencl-model TARGET_OPENCL_MODEL]
[--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
[--target-opencl-tag TARGET_OPENCL_TAG]
[--target-opencl-device TARGET_OPENCL_DEVICE]
[--target-opencl-keys TARGET_OPENCL_KEYS]
[--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
[--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
[--target-metal-from_device TARGET_METAL_FROM_DEVICE]
[--target-metal-libs TARGET_METAL_LIBS]
[--target-metal-keys TARGET_METAL_KEYS]
[--target-metal-model TARGET_METAL_MODEL]
[--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
[--target-metal-tag TARGET_METAL_TAG]
[--target-metal-device TARGET_METAL_DEVICE]
[--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
[--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
[--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
[--target-webgpu-libs TARGET_WEBGPU_LIBS]
[--target-webgpu-model TARGET_WEBGPU_MODEL]
[--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
[--target-webgpu-tag TARGET_WEBGPU_TAG]
[--target-webgpu-device TARGET_WEBGPU_DEVICE]
[--target-webgpu-keys TARGET_WEBGPU_KEYS]
[--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
[--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
[--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
[--target-rocm-libs TARGET_ROCM_LIBS]
[--target-rocm-mattr TARGET_ROCM_MATTR]
[--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-rocm-model TARGET_ROCM_MODEL]
[--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
[--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
[--target-rocm-tag TARGET_ROCM_TAG]
[--target-rocm-device TARGET_ROCM_DEVICE]
[--target-rocm-mcpu TARGET_ROCM_MCPU]
[--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK]
[--target-rocm-keys TARGET_ROCM_KEYS]
[--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
[--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
[--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
[--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
[--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
[--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
[--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
[--target-vulkan-libs TARGET_VULKAN_LIBS]
[--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
[--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
[--target-vulkan-mattr TARGET_VULKAN_MATTR]
[--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
[--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
[--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
[--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
[--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
[--target-vulkan-model TARGET_VULKAN_MODEL]
[--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
[--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
[--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
[--target-vulkan-tag TARGET_VULKAN_TAG]
[--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
[--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
[--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
[--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
[--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
[--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
[--target-vulkan-device TARGET_VULKAN_DEVICE]
[--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK]
[--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
[--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
[--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT]
[--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
[--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
[--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
[--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
[--target-vulkan-keys TARGET_VULKAN_KEYS]
[--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
[--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
[--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
[--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
[--target-cuda-arch TARGET_CUDA_ARCH]
[--target-cuda-libs TARGET_CUDA_LIBS]
[--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-cuda-model TARGET_CUDA_MODEL]
[--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
[--target-cuda-tag TARGET_CUDA_TAG]
[--target-cuda-device TARGET_CUDA_DEVICE]
[--target-cuda-mcpu TARGET_CUDA_MCPU]
[--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
[--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
[--target-cuda-keys TARGET_CUDA_KEYS]
[--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
[--target-sdaccel-libs TARGET_SDACCEL_LIBS]
[--target-sdaccel-model TARGET_SDACCEL_MODEL]
[--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
[--target-sdaccel-tag TARGET_SDACCEL_TAG]
[--target-sdaccel-device TARGET_SDACCEL_DEVICE]
[--target-sdaccel-keys TARGET_SDACCEL_KEYS]
[--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
[--target-composite-libs TARGET_COMPOSITE_LIBS]
[--target-composite-devices TARGET_COMPOSITE_DEVICES]
[--target-composite-model TARGET_COMPOSITE_MODEL]
[--target-composite-tag TARGET_COMPOSITE_TAG]
[--target-composite-device TARGET_COMPOSITE_DEVICE]
[--target-composite-keys TARGET_COMPOSITE_KEYS]
[--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
[--target-stackvm-libs TARGET_STACKVM_LIBS]
[--target-stackvm-model TARGET_STACKVM_MODEL]
[--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
[--target-stackvm-tag TARGET_STACKVM_TAG]
[--target-stackvm-device TARGET_STACKVM_DEVICE]
[--target-stackvm-keys TARGET_STACKVM_KEYS]
[--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
[--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
[--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
[--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
[--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
[--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
[--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
[--target-c-unpacked-api TARGET_C_UNPACKED_API]
[--target-c-from_device TARGET_C_FROM_DEVICE]
[--target-c-libs TARGET_C_LIBS]
[--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
[--target-c-executor TARGET_C_EXECUTOR]
[--target-c-link-params TARGET_C_LINK_PARAMS]
[--target-c-model TARGET_C_MODEL]
[--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
[--target-c-system-lib TARGET_C_SYSTEM_LIB]
[--target-c-tag TARGET_C_TAG]
[--target-c-interface-api TARGET_C_INTERFACE_API]
[--target-c-mcpu TARGET_C_MCPU]
[--target-c-device TARGET_C_DEVICE]
[--target-c-runtime TARGET_C_RUNTIME]
[--target-c-keys TARGET_C_KEYS]
[--target-c-march TARGET_C_MARCH]
[--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
[--target-hexagon-libs TARGET_HEXAGON_LIBS]
[--target-hexagon-mattr TARGET_HEXAGON_MATTR]
[--target-hexagon-model TARGET_HEXAGON_MODEL]
[--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
[--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
[--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
[--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
[--target-hexagon-device TARGET_HEXAGON_DEVICE]
[--target-hexagon-tag TARGET_HEXAGON_TAG]
[--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
[--target-hexagon-keys TARGET_HEXAGON_KEYS]
[--tuning-records PATH] [--executor EXECUTOR]
[--executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS]
[--executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT]
[--executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API]
[--executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API]
[--executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS]
[--runtime RUNTIME]
[--runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB]
[--runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB] [-v]
[-O [0-3]] [--input-shapes INPUT_SHAPES]
[--disabled-pass DISABLED_PASS]
[--module-name MODULE_NAME]
FILE
positional arguments:
FILE path to the input model file.
optional arguments:
-h, --help show this help message and exit
--cross-compiler CROSS_COMPILER
the cross compiler to generate target libraries, e.g.
'aarch64-linux-gnu-gcc'.
--cross-compiler-options CROSS_COMPILER_OPTIONS
the cross compiler options to generate target
libraries, e.g. '-mfpu=neon-vfpv4'.
--desired-layout {NCHW,NHWC}
change the data layout of the whole graph.
--dump-code FORMAT comma separated list of formats to export the input
model, e.g. 'asm,ll,relay'.
--model-format {keras,onnx,pb,tflite,pytorch,paddle}
specify input model format.
-o OUTPUT, --output OUTPUT
output the compiled module to a specified archive.
Defaults to 'module.tar'.
-f {so,mlf}, --output-format {so,mlf}
output format. Use 'so' for shared object or 'mlf' for
Model Library Format (only for microTVM targets).
Defaults to 'so'.
--pass-config name=value
configurations to be used at compile time. This option
can be provided multiple times, each one to set one
configuration value, e.g. '--pass-config
relay.backend.use_auto_scheduler=0', e.g. '--pass-
config
tir.add_lower_pass=opt_level1,pass1,opt_level2,pass2'.
--target TARGET compilation target as plain string, inline JSON or
path to a JSON file
--tuning-records PATH
path to an auto-tuning log file by AutoTVM. If not
presented, the fallback/tophub configs will be used.
--executor EXECUTOR Executor to compile the model with
--runtime RUNTIME Runtime to compile the model with
-v, --verbose increase verbosity.
-O [0-3], --opt-level [0-3]
specify which optimization level to use. Defaults to
'3'.
--input-shapes INPUT_SHAPES
specify non-generic shapes for model to run, format is
"input_name:[dim1,dim2,...,dimn]
input_name2:[dim1,dim2]".
--disabled-pass DISABLED_PASS
disable specific passes, comma-separated list of pass
names.
--module-name MODULE_NAME
The output module name. Defaults to 'default'.
target example_target_hook:
--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
target example_target_hook from_device
--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
target example_target_hook libs options
--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
target example_target_hook model string
--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
target example_target_hook tag string
--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
target example_target_hook device string
--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
target example_target_hook keys options
target ext_dev:
--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
target ext_dev from_device
--target-ext_dev-libs TARGET_EXT_DEV_LIBS
target ext_dev libs options
--target-ext_dev-model TARGET_EXT_DEV_MODEL
target ext_dev model string
--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
target ext_dev system-lib
--target-ext_dev-tag TARGET_EXT_DEV_TAG
target ext_dev tag string
--target-ext_dev-device TARGET_EXT_DEV_DEVICE
target ext_dev device string
--target-ext_dev-keys TARGET_EXT_DEV_KEYS
target ext_dev keys options
target llvm:
--target-llvm-fast-math TARGET_LLVM_FAST_MATH
target llvm fast-math
--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
target llvm opt-level
--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
target llvm unpacked-api
--target-llvm-from_device TARGET_LLVM_FROM_DEVICE
target llvm from_device
--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
target llvm fast-math-ninf
--target-llvm-mattr TARGET_LLVM_MATTR
target llvm mattr options
--target-llvm-num-cores TARGET_LLVM_NUM_CORES
target llvm num-cores
--target-llvm-libs TARGET_LLVM_LIBS
target llvm libs options
--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
target llvm fast-math-nsz
--target-llvm-link-params TARGET_LLVM_LINK_PARAMS
target llvm link-params
--target-llvm-interface-api TARGET_LLVM_INTERFACE_API
target llvm interface-api string
--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
target llvm fast-math-contract
--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
target llvm system-lib
--target-llvm-tag TARGET_LLVM_TAG
target llvm tag string
--target-llvm-mtriple TARGET_LLVM_MTRIPLE
target llvm mtriple string
--target-llvm-model TARGET_LLVM_MODEL
target llvm model string
--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
target llvm mfloat-abi string
--target-llvm-mcpu TARGET_LLVM_MCPU
target llvm mcpu string
--target-llvm-device TARGET_LLVM_DEVICE
target llvm device string
--target-llvm-runtime TARGET_LLVM_RUNTIME
target llvm runtime string
--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
target llvm fast-math-arcp
--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
target llvm fast-math-reassoc
--target-llvm-mabi TARGET_LLVM_MABI
target llvm mabi string
--target-llvm-keys TARGET_LLVM_KEYS
target llvm keys options
--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
target llvm fast-math-nnan
target hybrid:
--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
target hybrid from_device
--target-hybrid-libs TARGET_HYBRID_LIBS
target hybrid libs options
--target-hybrid-model TARGET_HYBRID_MODEL
target hybrid model string
--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
target hybrid system-lib
--target-hybrid-tag TARGET_HYBRID_TAG
target hybrid tag string
--target-hybrid-device TARGET_HYBRID_DEVICE
target hybrid device string
--target-hybrid-keys TARGET_HYBRID_KEYS
target hybrid keys options
target aocl:
--target-aocl-from_device TARGET_AOCL_FROM_DEVICE
target aocl from_device
--target-aocl-libs TARGET_AOCL_LIBS
target aocl libs options
--target-aocl-model TARGET_AOCL_MODEL
target aocl model string
--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
target aocl system-lib
--target-aocl-tag TARGET_AOCL_TAG
target aocl tag string
--target-aocl-device TARGET_AOCL_DEVICE
target aocl device string
--target-aocl-keys TARGET_AOCL_KEYS
target aocl keys options
target nvptx:
--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
target nvptx max_num_threads
--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
target nvptx thread_warp_size
--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
target nvptx from_device
--target-nvptx-libs TARGET_NVPTX_LIBS
target nvptx libs options
--target-nvptx-model TARGET_NVPTX_MODEL
target nvptx model string
--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
target nvptx system-lib
--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
target nvptx mtriple string
--target-nvptx-tag TARGET_NVPTX_TAG
target nvptx tag string
--target-nvptx-mcpu TARGET_NVPTX_MCPU
target nvptx mcpu string
--target-nvptx-device TARGET_NVPTX_DEVICE
target nvptx device string
--target-nvptx-keys TARGET_NVPTX_KEYS
target nvptx keys options
target opencl:
--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
target opencl max_num_threads
--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
target opencl thread_warp_size
--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
target opencl from_device
--target-opencl-libs TARGET_OPENCL_LIBS
target opencl libs options
--target-opencl-model TARGET_OPENCL_MODEL
target opencl model string
--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
target opencl system-lib
--target-opencl-tag TARGET_OPENCL_TAG
target opencl tag string
--target-opencl-device TARGET_OPENCL_DEVICE
target opencl device string
--target-opencl-keys TARGET_OPENCL_KEYS
target opencl keys options
target metal:
--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
target metal max_num_threads
--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
target metal thread_warp_size
--target-metal-from_device TARGET_METAL_FROM_DEVICE
target metal from_device
--target-metal-libs TARGET_METAL_LIBS
target metal libs options
--target-metal-keys TARGET_METAL_KEYS
target metal keys options
--target-metal-model TARGET_METAL_MODEL
target metal model string
--target-metal-system-lib TARGET_METAL_SYSTEM_LIB
target metal system-lib
--target-metal-tag TARGET_METAL_TAG
target metal tag string
--target-metal-device TARGET_METAL_DEVICE
target metal device string
--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
target metal max_function_args
target webgpu:
--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
target webgpu max_num_threads
--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
target webgpu from_device
--target-webgpu-libs TARGET_WEBGPU_LIBS
target webgpu libs options
--target-webgpu-model TARGET_WEBGPU_MODEL
target webgpu model string
--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
target webgpu system-lib
--target-webgpu-tag TARGET_WEBGPU_TAG
target webgpu tag string
--target-webgpu-device TARGET_WEBGPU_DEVICE
target webgpu device string
--target-webgpu-keys TARGET_WEBGPU_KEYS
target webgpu keys options
target rocm:
--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
target rocm max_num_threads
--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
target rocm thread_warp_size
--target-rocm-from_device TARGET_ROCM_FROM_DEVICE
target rocm from_device
--target-rocm-libs TARGET_ROCM_LIBS
target rocm libs options
--target-rocm-mattr TARGET_ROCM_MATTR
target rocm mattr options
--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK
target rocm max_shared_memory_per_block
--target-rocm-model TARGET_ROCM_MODEL
target rocm model string
--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
target rocm system-lib
--target-rocm-mtriple TARGET_ROCM_MTRIPLE
target rocm mtriple string
--target-rocm-tag TARGET_ROCM_TAG
target rocm tag string
--target-rocm-device TARGET_ROCM_DEVICE
target rocm device string
--target-rocm-mcpu TARGET_ROCM_MCPU
target rocm mcpu string
--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK
target rocm max_threads_per_block
--target-rocm-keys TARGET_ROCM_KEYS
target rocm keys options
target vulkan:
--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
target vulkan max_num_threads
--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
target vulkan thread_warp_size
--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
target vulkan from_device
--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
target vulkan max_per_stage_descriptor_storage_buffer
--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
target vulkan driver_version
--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
target vulkan supports_16bit_buffer
--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
target vulkan max_block_size_z
--target-vulkan-libs TARGET_VULKAN_LIBS
target vulkan libs options
--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
target vulkan supports_dedicated_allocation
--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
target vulkan supported_subgroup_operations
--target-vulkan-mattr TARGET_VULKAN_MATTR
target vulkan mattr options
--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
target vulkan max_storage_buffer_range
--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
target vulkan max_push_constants_size
--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
target vulkan supports_push_descriptor
--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
target vulkan supports_int64
--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
target vulkan supports_float32
--target-vulkan-model TARGET_VULKAN_MODEL
target vulkan model string
--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
target vulkan max_block_size_x
--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
target vulkan system-lib
--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
target vulkan max_block_size_y
--target-vulkan-tag TARGET_VULKAN_TAG
target vulkan tag string
--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
target vulkan supports_int8
--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
target vulkan max_spirv_version
--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
target vulkan vulkan_api_version
--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
target vulkan supports_8bit_buffer
--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
target vulkan device_type string
--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
target vulkan supports_int32
--target-vulkan-device TARGET_VULKAN_DEVICE
target vulkan device string
--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK
target vulkan max_threads_per_block
--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
target vulkan max_uniform_buffer_range
--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
target vulkan driver_name string
--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT
target vulkan supports_integer_dot_product
--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
target vulkan supports_storage_buffer_storage_class
--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
target vulkan supports_float16
--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
target vulkan device_name string
--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
target vulkan supports_float64
--target-vulkan-keys TARGET_VULKAN_KEYS
target vulkan keys options
--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
target vulkan max_shared_memory_per_block
--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
target vulkan supports_int16
target cuda:
--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
target cuda max_num_threads
--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
target cuda thread_warp_size
--target-cuda-from_device TARGET_CUDA_FROM_DEVICE
target cuda from_device
--target-cuda-arch TARGET_CUDA_ARCH
target cuda arch string
--target-cuda-libs TARGET_CUDA_LIBS
target cuda libs options
--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK
target cuda max_shared_memory_per_block
--target-cuda-model TARGET_CUDA_MODEL
target cuda model string
--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
target cuda system-lib
--target-cuda-tag TARGET_CUDA_TAG
target cuda tag string
--target-cuda-device TARGET_CUDA_DEVICE
target cuda device string
--target-cuda-mcpu TARGET_CUDA_MCPU
target cuda mcpu string
--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
target cuda max_threads_per_block
--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
target cuda registers_per_block
--target-cuda-keys TARGET_CUDA_KEYS
target cuda keys options
target sdaccel:
--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
target sdaccel from_device
--target-sdaccel-libs TARGET_SDACCEL_LIBS
target sdaccel libs options
--target-sdaccel-model TARGET_SDACCEL_MODEL
target sdaccel model string
--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
target sdaccel system-lib
--target-sdaccel-tag TARGET_SDACCEL_TAG
target sdaccel tag string
--target-sdaccel-device TARGET_SDACCEL_DEVICE
target sdaccel device string
--target-sdaccel-keys TARGET_SDACCEL_KEYS
target sdaccel keys options
target composite:
--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
target composite from_device
--target-composite-libs TARGET_COMPOSITE_LIBS
target composite libs options
--target-composite-devices TARGET_COMPOSITE_DEVICES
target composite devices options
--target-composite-model TARGET_COMPOSITE_MODEL
target composite model string
--target-composite-tag TARGET_COMPOSITE_TAG
target composite tag string
--target-composite-device TARGET_COMPOSITE_DEVICE
target composite device string
--target-composite-keys TARGET_COMPOSITE_KEYS
target composite keys options
target stackvm:
--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
target stackvm from_device
--target-stackvm-libs TARGET_STACKVM_LIBS
target stackvm libs options
--target-stackvm-model TARGET_STACKVM_MODEL
target stackvm model string
--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
target stackvm system-lib
--target-stackvm-tag TARGET_STACKVM_TAG
target stackvm tag string
--target-stackvm-device TARGET_STACKVM_DEVICE
target stackvm device string
--target-stackvm-keys TARGET_STACKVM_KEYS
target stackvm keys options
target aocl_sw_emu:
--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
target aocl_sw_emu from_device
--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
target aocl_sw_emu libs options
--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
target aocl_sw_emu model string
--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
target aocl_sw_emu system-lib
--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
target aocl_sw_emu tag string
--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
target aocl_sw_emu device string
--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
target aocl_sw_emu keys options
target c:
--target-c-unpacked-api TARGET_C_UNPACKED_API
target c unpacked-api
--target-c-from_device TARGET_C_FROM_DEVICE
target c from_device
--target-c-libs TARGET_C_LIBS
target c libs options
--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
target c constants-byte-alignment
--target-c-executor TARGET_C_EXECUTOR
target c executor string
--target-c-link-params TARGET_C_LINK_PARAMS
target c link-params
--target-c-model TARGET_C_MODEL
target c model string
--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
target c workspace-byte-alignment
--target-c-system-lib TARGET_C_SYSTEM_LIB
target c system-lib
--target-c-tag TARGET_C_TAG
target c tag string
--target-c-interface-api TARGET_C_INTERFACE_API
target c interface-api string
--target-c-mcpu TARGET_C_MCPU
target c mcpu string
--target-c-device TARGET_C_DEVICE
target c device string
--target-c-runtime TARGET_C_RUNTIME
target c runtime string
--target-c-keys TARGET_C_KEYS
target c keys options
--target-c-march TARGET_C_MARCH
target c march string
target hexagon:
--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
target hexagon from_device
--target-hexagon-libs TARGET_HEXAGON_LIBS
target hexagon libs options
--target-hexagon-mattr TARGET_HEXAGON_MATTR
target hexagon mattr options
--target-hexagon-model TARGET_HEXAGON_MODEL
target hexagon model string
--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
target hexagon llvm-options options
--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
target hexagon mtriple string
--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
target hexagon system-lib
--target-hexagon-mcpu TARGET_HEXAGON_MCPU
target hexagon mcpu string
--target-hexagon-device TARGET_HEXAGON_DEVICE
target hexagon device string
--target-hexagon-tag TARGET_HEXAGON_TAG
target hexagon tag string
--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
target hexagon link-params
--target-hexagon-keys TARGET_HEXAGON_KEYS
target hexagon keys options
executor graph:
--executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS
Executor graph link-params
executor aot:
--executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT
Executor aot workspace-byte-alignment
--executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API
Executor aot unpacked-api
--executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API
Executor aot interface-api string
--executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS
Executor aot link-params
runtime cpp:
--runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB
Runtime cpp system-lib
runtime crt:
--runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB
Runtime crt system-lib
现在,模型的调谐数据已经收集完毕,可以使用优化的算子重新编译模型,以加快计算速度。
!python -m tvm.driver.tvmc compile \
--target "llvm" \
--tuning-records resnet50-v2-7-autotuner_records.json \
--output resnet50-v2-7-tvm_autotuned.tar \
../../_models/resnet50-v2-7.onnx
验证优化后的模型是否运行并产生相同的结果:
!python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm_autotuned.tar
!python postprocess.py
class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262
比较已调谐和未调谐的模型#
TVMC 提供了在模型之间进行基本性能基准测试的工具。你可以指定重复次数,并且 TVMC 报告模型的运行时间(与运行时间的启动无关)。可以粗略了解调谐对模型性能的改善程度。例如,在测试的英特尔 i7 系统上,看到调谐后的模型比未调谐的模型运行快 \(47\%\)。
!python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm_autotuned.tar
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
41.2506 40.8879 54.4469 36.7249 2.4430
!python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm.tar
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
51.8327 52.5906 67.5374 42.9440 4.4040
小结#
在本教程中,介绍了 TVMC,用于 TVM 的命令行驱动。演示了如何编译、运行和调优模型。还讨论了对输入和输出进行预处理和后处理的必要性。在调优过程之后,演示了如何比较未优化和优化后的模型的性能。
这里介绍了使用 ResNet-50 v2 本地的简单例子。然而,TVMC 支持更多的功能,包括交叉编译、远程执行和剖析/基准测试(profiling/benchmarking)。
要想知道还有哪些可用的选项,请看 tvmc --help
。
在 用 Python 接口编译和优化模型 教程中,将使用 Python 接口介绍同样的编译和优化步骤。