用 TVMC 编译和优化模型#

原作者:Leandro Nunes, Matthew Barrett, Chris Hoge

在本节中,将使用 TVMC,即 TVM 命令行驱动程序。TVMC 工具,它暴露了 TVM 的功能,如 auto-tuning、编译、profiling 和通过命令行界面执行模型。

在完成本节内容后,将使用 TVMC 来完成以下任务:

  • 为 TVM 运行时编译预训练 ResNet-50 v2 模型。

  • 通过编译后的模型运行真实图像,并解释输出和模型的性能。

  • 使用 TVM 在 CPU 上调优模型。

  • 使用 TVM 收集的调优数据重新编译优化模型。

  • 通过优化后的模型运行图像,并比较输出和模型的性能。

本节的目的是让你了解 TVM 和 TVMC 的能力,并为理解 TVM 的工作原理奠定基础。

使用 TVMC#

TVMC 是 Python 应用程序,是 TVM Python 软件包的一部分。当你使用 Python 包安装 TVM 时,你将得到 TVMC 作为命令行应用程序,名为 tvmc。这个命令的位置将取决于你的平台和安装方法。

另外,如果你在 $PYTHONPATH 上将 TVM 作为 Python 模块,你可以通过可执行的 python 模块 python -m tvm.driver.tvmc 访问命令行驱动功能。

为简单起见,本教程将提到 TVMC 命令行使用 tvmc <options>,但同样的结果可以用 python -m tvm.driver.tvmc <options>

你可以使用帮助页面查看:

!python -m tvm.driver.tvmc --help
usage: tvmc [--config CONFIG] [-v] [--version] [-h]
            {micro,run,tune,compile} ...

TVM compiler driver

options:
  --config CONFIG       configuration json file
  -v, --verbose         increase verbosity
  --version             print the version and exit
  -h, --help            show this help message and exit.

commands:
  {micro,run,tune,compile}
    micro               select micro context.
    run                 run a compiled module
    tune                auto-tune a model
    compile             compile a model.

TVMC - TVM driver command-line interface

tvmc 可用的 TVM 的主要功能来自子命令 compilerun,以及 tune。要了解某个子命令下的具体选项,请使用 tvmc <subcommand> --help。将在本教程中逐一介绍这些命令,但首先需要下载预训练模型来使用。

获得模型#

在本教程中,将使用 ResNet-50 v2。ResNet-50 是卷积神经网络,有 50 层深度,设计用于图像分类。将使用的模型已经在超过一百万张图片上进行了预训练,有 1000 种不同的分类。该网络输入图像大小为 224x224。如果你有兴趣探究更多关于 ResNet-50 模型的结构,建议下载 `Netron,它免费提供的 ML 模型查看器。

在本教程中,将使用 ONNX 格式的模型。

!wget https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx
--2022-04-26 13:07:52--  https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://media.githubusercontent.com/media/onnx/models/main/vision/classification/resnet/model/resnet50-v2-7.onnx [following]
--2022-04-26 13:07:53--  https://media.githubusercontent.com/media/onnx/models/main/vision/classification/resnet/model/resnet50-v2-7.onnx
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102442450 (98M) [application/octet-stream]
Saving to: ‘resnet50-v2-7.onnx’

resnet50-v2-7.onnx  100%[===================>]  97.70M  4.51MB/s    in 25s     

2022-04-26 13:08:27 (3.89 MB/s) - ‘resnet50-v2-7.onnx’ saved [102442450/102442450]

为了让该模型可以被其他教程使用,需要:

!mv resnet50-v2-7.onnx ../../_models/resnet50-v2-7.onnx

支持的模型格式

TVMC 支持用 Keras、ONNX、TensorFlow、TFLite 和 Torch 创建的模型。如果你需要明确地提供你所使用的模型格式,请使用选项 --model-format

更多信息见:

!python -m tvm.driver.tvmc compile --help
usage: tvmc compile [-h] [--cross-compiler CROSS_COMPILER]
                    [--cross-compiler-options CROSS_COMPILER_OPTIONS]
                    [--desired-layout {NCHW,NHWC}] [--dump-code FORMAT]
                    [--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
                    [-o OUTPUT] [-f {so,mlf}] [--pass-config name=value]
                    [--target TARGET]
                    [--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
                    [--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
                    [--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
                    [--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
                    [--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
                    [--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
                    [--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
                    [--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
                    [--target-ext_dev-model TARGET_EXT_DEV_MODEL]
                    [--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
                    [--target-ext_dev-tag TARGET_EXT_DEV_TAG]
                    [--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
                    [--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
                    [--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
                    [--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
                    [--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
                    [--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
                    [--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
                    [--target-llvm-mattr TARGET_LLVM_MATTR]
                    [--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
                    [--target-llvm-libs TARGET_LLVM_LIBS]
                    [--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
                    [--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
                    [--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
                    [--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
                    [--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
                    [--target-llvm-tag TARGET_LLVM_TAG]
                    [--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
                    [--target-llvm-model TARGET_LLVM_MODEL]
                    [--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
                    [--target-llvm-mcpu TARGET_LLVM_MCPU]
                    [--target-llvm-device TARGET_LLVM_DEVICE]
                    [--target-llvm-runtime TARGET_LLVM_RUNTIME]
                    [--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
                    [--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
                    [--target-llvm-mabi TARGET_LLVM_MABI]
                    [--target-llvm-keys TARGET_LLVM_KEYS]
                    [--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
                    [--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
                    [--target-hybrid-libs TARGET_HYBRID_LIBS]
                    [--target-hybrid-model TARGET_HYBRID_MODEL]
                    [--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
                    [--target-hybrid-tag TARGET_HYBRID_TAG]
                    [--target-hybrid-device TARGET_HYBRID_DEVICE]
                    [--target-hybrid-keys TARGET_HYBRID_KEYS]
                    [--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
                    [--target-aocl-libs TARGET_AOCL_LIBS]
                    [--target-aocl-model TARGET_AOCL_MODEL]
                    [--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
                    [--target-aocl-tag TARGET_AOCL_TAG]
                    [--target-aocl-device TARGET_AOCL_DEVICE]
                    [--target-aocl-keys TARGET_AOCL_KEYS]
                    [--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
                    [--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
                    [--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
                    [--target-nvptx-libs TARGET_NVPTX_LIBS]
                    [--target-nvptx-model TARGET_NVPTX_MODEL]
                    [--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
                    [--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
                    [--target-nvptx-tag TARGET_NVPTX_TAG]
                    [--target-nvptx-mcpu TARGET_NVPTX_MCPU]
                    [--target-nvptx-device TARGET_NVPTX_DEVICE]
                    [--target-nvptx-keys TARGET_NVPTX_KEYS]
                    [--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
                    [--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
                    [--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
                    [--target-opencl-libs TARGET_OPENCL_LIBS]
                    [--target-opencl-model TARGET_OPENCL_MODEL]
                    [--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
                    [--target-opencl-tag TARGET_OPENCL_TAG]
                    [--target-opencl-device TARGET_OPENCL_DEVICE]
                    [--target-opencl-keys TARGET_OPENCL_KEYS]
                    [--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
                    [--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
                    [--target-metal-from_device TARGET_METAL_FROM_DEVICE]
                    [--target-metal-libs TARGET_METAL_LIBS]
                    [--target-metal-keys TARGET_METAL_KEYS]
                    [--target-metal-model TARGET_METAL_MODEL]
                    [--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
                    [--target-metal-tag TARGET_METAL_TAG]
                    [--target-metal-device TARGET_METAL_DEVICE]
                    [--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
                    [--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
                    [--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
                    [--target-webgpu-libs TARGET_WEBGPU_LIBS]
                    [--target-webgpu-model TARGET_WEBGPU_MODEL]
                    [--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
                    [--target-webgpu-tag TARGET_WEBGPU_TAG]
                    [--target-webgpu-device TARGET_WEBGPU_DEVICE]
                    [--target-webgpu-keys TARGET_WEBGPU_KEYS]
                    [--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
                    [--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
                    [--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
                    [--target-rocm-libs TARGET_ROCM_LIBS]
                    [--target-rocm-mattr TARGET_ROCM_MATTR]
                    [--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK]
                    [--target-rocm-model TARGET_ROCM_MODEL]
                    [--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
                    [--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
                    [--target-rocm-tag TARGET_ROCM_TAG]
                    [--target-rocm-device TARGET_ROCM_DEVICE]
                    [--target-rocm-mcpu TARGET_ROCM_MCPU]
                    [--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK]
                    [--target-rocm-keys TARGET_ROCM_KEYS]
                    [--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
                    [--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
                    [--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
                    [--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
                    [--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
                    [--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
                    [--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
                    [--target-vulkan-libs TARGET_VULKAN_LIBS]
                    [--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
                    [--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
                    [--target-vulkan-mattr TARGET_VULKAN_MATTR]
                    [--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
                    [--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
                    [--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
                    [--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
                    [--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
                    [--target-vulkan-model TARGET_VULKAN_MODEL]
                    [--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
                    [--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
                    [--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
                    [--target-vulkan-tag TARGET_VULKAN_TAG]
                    [--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
                    [--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
                    [--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
                    [--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
                    [--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
                    [--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
                    [--target-vulkan-device TARGET_VULKAN_DEVICE]
                    [--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK]
                    [--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
                    [--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
                    [--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT]
                    [--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
                    [--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
                    [--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
                    [--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
                    [--target-vulkan-keys TARGET_VULKAN_KEYS]
                    [--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
                    [--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
                    [--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
                    [--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
                    [--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
                    [--target-cuda-arch TARGET_CUDA_ARCH]
                    [--target-cuda-libs TARGET_CUDA_LIBS]
                    [--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK]
                    [--target-cuda-model TARGET_CUDA_MODEL]
                    [--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
                    [--target-cuda-tag TARGET_CUDA_TAG]
                    [--target-cuda-device TARGET_CUDA_DEVICE]
                    [--target-cuda-mcpu TARGET_CUDA_MCPU]
                    [--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
                    [--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
                    [--target-cuda-keys TARGET_CUDA_KEYS]
                    [--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
                    [--target-sdaccel-libs TARGET_SDACCEL_LIBS]
                    [--target-sdaccel-model TARGET_SDACCEL_MODEL]
                    [--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
                    [--target-sdaccel-tag TARGET_SDACCEL_TAG]
                    [--target-sdaccel-device TARGET_SDACCEL_DEVICE]
                    [--target-sdaccel-keys TARGET_SDACCEL_KEYS]
                    [--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
                    [--target-composite-libs TARGET_COMPOSITE_LIBS]
                    [--target-composite-devices TARGET_COMPOSITE_DEVICES]
                    [--target-composite-model TARGET_COMPOSITE_MODEL]
                    [--target-composite-tag TARGET_COMPOSITE_TAG]
                    [--target-composite-device TARGET_COMPOSITE_DEVICE]
                    [--target-composite-keys TARGET_COMPOSITE_KEYS]
                    [--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
                    [--target-stackvm-libs TARGET_STACKVM_LIBS]
                    [--target-stackvm-model TARGET_STACKVM_MODEL]
                    [--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
                    [--target-stackvm-tag TARGET_STACKVM_TAG]
                    [--target-stackvm-device TARGET_STACKVM_DEVICE]
                    [--target-stackvm-keys TARGET_STACKVM_KEYS]
                    [--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
                    [--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
                    [--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
                    [--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
                    [--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
                    [--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
                    [--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
                    [--target-c-unpacked-api TARGET_C_UNPACKED_API]
                    [--target-c-from_device TARGET_C_FROM_DEVICE]
                    [--target-c-libs TARGET_C_LIBS]
                    [--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
                    [--target-c-executor TARGET_C_EXECUTOR]
                    [--target-c-link-params TARGET_C_LINK_PARAMS]
                    [--target-c-model TARGET_C_MODEL]
                    [--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
                    [--target-c-system-lib TARGET_C_SYSTEM_LIB]
                    [--target-c-tag TARGET_C_TAG]
                    [--target-c-interface-api TARGET_C_INTERFACE_API]
                    [--target-c-mcpu TARGET_C_MCPU]
                    [--target-c-device TARGET_C_DEVICE]
                    [--target-c-runtime TARGET_C_RUNTIME]
                    [--target-c-keys TARGET_C_KEYS]
                    [--target-c-march TARGET_C_MARCH]
                    [--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
                    [--target-hexagon-libs TARGET_HEXAGON_LIBS]
                    [--target-hexagon-mattr TARGET_HEXAGON_MATTR]
                    [--target-hexagon-model TARGET_HEXAGON_MODEL]
                    [--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
                    [--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
                    [--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
                    [--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
                    [--target-hexagon-device TARGET_HEXAGON_DEVICE]
                    [--target-hexagon-tag TARGET_HEXAGON_TAG]
                    [--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
                    [--target-hexagon-keys TARGET_HEXAGON_KEYS]
                    [--tuning-records PATH] [--executor EXECUTOR]
                    [--executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS]
                    [--executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT]
                    [--executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API]
                    [--executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API]
                    [--executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS]
                    [--runtime RUNTIME]
                    [--runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB]
                    [--runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB] [-v]
                    [-O [0-3]] [--input-shapes INPUT_SHAPES]
                    [--disabled-pass DISABLED_PASS]
                    [--module-name MODULE_NAME]
                    FILE

positional arguments:
  FILE                  path to the input model file.

options:
  -h, --help            show this help message and exit
  --cross-compiler CROSS_COMPILER
                        the cross compiler to generate target libraries, e.g.
                        'aarch64-linux-gnu-gcc'.
  --cross-compiler-options CROSS_COMPILER_OPTIONS
                        the cross compiler options to generate target
                        libraries, e.g. '-mfpu=neon-vfpv4'.
  --desired-layout {NCHW,NHWC}
                        change the data layout of the whole graph.
  --dump-code FORMAT    comma separated list of formats to export the input
                        model, e.g. 'asm,ll,relay'.
  --model-format {keras,onnx,pb,tflite,pytorch,paddle}
                        specify input model format.
  -o OUTPUT, --output OUTPUT
                        output the compiled module to a specified archive.
                        Defaults to 'module.tar'.
  -f {so,mlf}, --output-format {so,mlf}
                        output format. Use 'so' for shared object or 'mlf' for
                        Model Library Format (only for microTVM targets).
                        Defaults to 'so'.
  --pass-config name=value
                        configurations to be used at compile time. This option
                        can be provided multiple times, each one to set one
                        configuration value, e.g. '--pass-config
                        relay.backend.use_auto_scheduler=0', e.g. '--pass-
                        config
                        tir.add_lower_pass=opt_level1,pass1,opt_level2,pass2'.
  --target TARGET       compilation target as plain string, inline JSON or
                        path to a JSON file
  --tuning-records PATH
                        path to an auto-tuning log file by AutoTVM. If not
                        presented, the fallback/tophub configs will be used.
  --executor EXECUTOR   Executor to compile the model with
  --runtime RUNTIME     Runtime to compile the model with
  -v, --verbose         increase verbosity.
  -O [0-3], --opt-level [0-3]
                        specify which optimization level to use. Defaults to
                        '3'.
  --input-shapes INPUT_SHAPES
                        specify non-generic shapes for model to run, format is
                        "input_name:[dim1,dim2,...,dimn]
                        input_name2:[dim1,dim2]".
  --disabled-pass DISABLED_PASS
                        disable specific passes, comma-separated list of pass
                        names.
  --module-name MODULE_NAME
                        The output module name. Defaults to 'default'.

target example_target_hook:
  --target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
                        target example_target_hook from_device
  --target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
                        target example_target_hook libs options
  --target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
                        target example_target_hook model string
  --target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
                        target example_target_hook tag string
  --target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
                        target example_target_hook device string
  --target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
                        target example_target_hook keys options

target ext_dev:
  --target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
                        target ext_dev from_device
  --target-ext_dev-libs TARGET_EXT_DEV_LIBS
                        target ext_dev libs options
  --target-ext_dev-model TARGET_EXT_DEV_MODEL
                        target ext_dev model string
  --target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
                        target ext_dev system-lib
  --target-ext_dev-tag TARGET_EXT_DEV_TAG
                        target ext_dev tag string
  --target-ext_dev-device TARGET_EXT_DEV_DEVICE
                        target ext_dev device string
  --target-ext_dev-keys TARGET_EXT_DEV_KEYS
                        target ext_dev keys options

target llvm:
  --target-llvm-fast-math TARGET_LLVM_FAST_MATH
                        target llvm fast-math
  --target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
                        target llvm opt-level
  --target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
                        target llvm unpacked-api
  --target-llvm-from_device TARGET_LLVM_FROM_DEVICE
                        target llvm from_device
  --target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
                        target llvm fast-math-ninf
  --target-llvm-mattr TARGET_LLVM_MATTR
                        target llvm mattr options
  --target-llvm-num-cores TARGET_LLVM_NUM_CORES
                        target llvm num-cores
  --target-llvm-libs TARGET_LLVM_LIBS
                        target llvm libs options
  --target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
                        target llvm fast-math-nsz
  --target-llvm-link-params TARGET_LLVM_LINK_PARAMS
                        target llvm link-params
  --target-llvm-interface-api TARGET_LLVM_INTERFACE_API
                        target llvm interface-api string
  --target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
                        target llvm fast-math-contract
  --target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
                        target llvm system-lib
  --target-llvm-tag TARGET_LLVM_TAG
                        target llvm tag string
  --target-llvm-mtriple TARGET_LLVM_MTRIPLE
                        target llvm mtriple string
  --target-llvm-model TARGET_LLVM_MODEL
                        target llvm model string
  --target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
                        target llvm mfloat-abi string
  --target-llvm-mcpu TARGET_LLVM_MCPU
                        target llvm mcpu string
  --target-llvm-device TARGET_LLVM_DEVICE
                        target llvm device string
  --target-llvm-runtime TARGET_LLVM_RUNTIME
                        target llvm runtime string
  --target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
                        target llvm fast-math-arcp
  --target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
                        target llvm fast-math-reassoc
  --target-llvm-mabi TARGET_LLVM_MABI
                        target llvm mabi string
  --target-llvm-keys TARGET_LLVM_KEYS
                        target llvm keys options
  --target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
                        target llvm fast-math-nnan

target hybrid:
  --target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
                        target hybrid from_device
  --target-hybrid-libs TARGET_HYBRID_LIBS
                        target hybrid libs options
  --target-hybrid-model TARGET_HYBRID_MODEL
                        target hybrid model string
  --target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
                        target hybrid system-lib
  --target-hybrid-tag TARGET_HYBRID_TAG
                        target hybrid tag string
  --target-hybrid-device TARGET_HYBRID_DEVICE
                        target hybrid device string
  --target-hybrid-keys TARGET_HYBRID_KEYS
                        target hybrid keys options

target aocl:
  --target-aocl-from_device TARGET_AOCL_FROM_DEVICE
                        target aocl from_device
  --target-aocl-libs TARGET_AOCL_LIBS
                        target aocl libs options
  --target-aocl-model TARGET_AOCL_MODEL
                        target aocl model string
  --target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
                        target aocl system-lib
  --target-aocl-tag TARGET_AOCL_TAG
                        target aocl tag string
  --target-aocl-device TARGET_AOCL_DEVICE
                        target aocl device string
  --target-aocl-keys TARGET_AOCL_KEYS
                        target aocl keys options

target nvptx:
  --target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
                        target nvptx max_num_threads
  --target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
                        target nvptx thread_warp_size
  --target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
                        target nvptx from_device
  --target-nvptx-libs TARGET_NVPTX_LIBS
                        target nvptx libs options
  --target-nvptx-model TARGET_NVPTX_MODEL
                        target nvptx model string
  --target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
                        target nvptx system-lib
  --target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
                        target nvptx mtriple string
  --target-nvptx-tag TARGET_NVPTX_TAG
                        target nvptx tag string
  --target-nvptx-mcpu TARGET_NVPTX_MCPU
                        target nvptx mcpu string
  --target-nvptx-device TARGET_NVPTX_DEVICE
                        target nvptx device string
  --target-nvptx-keys TARGET_NVPTX_KEYS
                        target nvptx keys options

target opencl:
  --target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
                        target opencl max_num_threads
  --target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
                        target opencl thread_warp_size
  --target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
                        target opencl from_device
  --target-opencl-libs TARGET_OPENCL_LIBS
                        target opencl libs options
  --target-opencl-model TARGET_OPENCL_MODEL
                        target opencl model string
  --target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
                        target opencl system-lib
  --target-opencl-tag TARGET_OPENCL_TAG
                        target opencl tag string
  --target-opencl-device TARGET_OPENCL_DEVICE
                        target opencl device string
  --target-opencl-keys TARGET_OPENCL_KEYS
                        target opencl keys options

target metal:
  --target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
                        target metal max_num_threads
  --target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
                        target metal thread_warp_size
  --target-metal-from_device TARGET_METAL_FROM_DEVICE
                        target metal from_device
  --target-metal-libs TARGET_METAL_LIBS
                        target metal libs options
  --target-metal-keys TARGET_METAL_KEYS
                        target metal keys options
  --target-metal-model TARGET_METAL_MODEL
                        target metal model string
  --target-metal-system-lib TARGET_METAL_SYSTEM_LIB
                        target metal system-lib
  --target-metal-tag TARGET_METAL_TAG
                        target metal tag string
  --target-metal-device TARGET_METAL_DEVICE
                        target metal device string
  --target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
                        target metal max_function_args

target webgpu:
  --target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
                        target webgpu max_num_threads
  --target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
                        target webgpu from_device
  --target-webgpu-libs TARGET_WEBGPU_LIBS
                        target webgpu libs options
  --target-webgpu-model TARGET_WEBGPU_MODEL
                        target webgpu model string
  --target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
                        target webgpu system-lib
  --target-webgpu-tag TARGET_WEBGPU_TAG
                        target webgpu tag string
  --target-webgpu-device TARGET_WEBGPU_DEVICE
                        target webgpu device string
  --target-webgpu-keys TARGET_WEBGPU_KEYS
                        target webgpu keys options

target rocm:
  --target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
                        target rocm max_num_threads
  --target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
                        target rocm thread_warp_size
  --target-rocm-from_device TARGET_ROCM_FROM_DEVICE
                        target rocm from_device
  --target-rocm-libs TARGET_ROCM_LIBS
                        target rocm libs options
  --target-rocm-mattr TARGET_ROCM_MATTR
                        target rocm mattr options
  --target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK
                        target rocm max_shared_memory_per_block
  --target-rocm-model TARGET_ROCM_MODEL
                        target rocm model string
  --target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
                        target rocm system-lib
  --target-rocm-mtriple TARGET_ROCM_MTRIPLE
                        target rocm mtriple string
  --target-rocm-tag TARGET_ROCM_TAG
                        target rocm tag string
  --target-rocm-device TARGET_ROCM_DEVICE
                        target rocm device string
  --target-rocm-mcpu TARGET_ROCM_MCPU
                        target rocm mcpu string
  --target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK
                        target rocm max_threads_per_block
  --target-rocm-keys TARGET_ROCM_KEYS
                        target rocm keys options

target vulkan:
  --target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
                        target vulkan max_num_threads
  --target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
                        target vulkan thread_warp_size
  --target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
                        target vulkan from_device
  --target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
                        target vulkan max_per_stage_descriptor_storage_buffer
  --target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
                        target vulkan driver_version
  --target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
                        target vulkan supports_16bit_buffer
  --target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
                        target vulkan max_block_size_z
  --target-vulkan-libs TARGET_VULKAN_LIBS
                        target vulkan libs options
  --target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
                        target vulkan supports_dedicated_allocation
  --target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
                        target vulkan supported_subgroup_operations
  --target-vulkan-mattr TARGET_VULKAN_MATTR
                        target vulkan mattr options
  --target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
                        target vulkan max_storage_buffer_range
  --target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
                        target vulkan max_push_constants_size
  --target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
                        target vulkan supports_push_descriptor
  --target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
                        target vulkan supports_int64
  --target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
                        target vulkan supports_float32
  --target-vulkan-model TARGET_VULKAN_MODEL
                        target vulkan model string
  --target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
                        target vulkan max_block_size_x
  --target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
                        target vulkan system-lib
  --target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
                        target vulkan max_block_size_y
  --target-vulkan-tag TARGET_VULKAN_TAG
                        target vulkan tag string
  --target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
                        target vulkan supports_int8
  --target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
                        target vulkan max_spirv_version
  --target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
                        target vulkan vulkan_api_version
  --target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
                        target vulkan supports_8bit_buffer
  --target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
                        target vulkan device_type string
  --target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
                        target vulkan supports_int32
  --target-vulkan-device TARGET_VULKAN_DEVICE
                        target vulkan device string
  --target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK
                        target vulkan max_threads_per_block
  --target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
                        target vulkan max_uniform_buffer_range
  --target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
                        target vulkan driver_name string
  --target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT
                        target vulkan supports_integer_dot_product
  --target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
                        target vulkan supports_storage_buffer_storage_class
  --target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
                        target vulkan supports_float16
  --target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
                        target vulkan device_name string
  --target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
                        target vulkan supports_float64
  --target-vulkan-keys TARGET_VULKAN_KEYS
                        target vulkan keys options
  --target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
                        target vulkan max_shared_memory_per_block
  --target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
                        target vulkan supports_int16

target cuda:
  --target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
                        target cuda max_num_threads
  --target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
                        target cuda thread_warp_size
  --target-cuda-from_device TARGET_CUDA_FROM_DEVICE
                        target cuda from_device
  --target-cuda-arch TARGET_CUDA_ARCH
                        target cuda arch string
  --target-cuda-libs TARGET_CUDA_LIBS
                        target cuda libs options
  --target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK
                        target cuda max_shared_memory_per_block
  --target-cuda-model TARGET_CUDA_MODEL
                        target cuda model string
  --target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
                        target cuda system-lib
  --target-cuda-tag TARGET_CUDA_TAG
                        target cuda tag string
  --target-cuda-device TARGET_CUDA_DEVICE
                        target cuda device string
  --target-cuda-mcpu TARGET_CUDA_MCPU
                        target cuda mcpu string
  --target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
                        target cuda max_threads_per_block
  --target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
                        target cuda registers_per_block
  --target-cuda-keys TARGET_CUDA_KEYS
                        target cuda keys options

target sdaccel:
  --target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
                        target sdaccel from_device
  --target-sdaccel-libs TARGET_SDACCEL_LIBS
                        target sdaccel libs options
  --target-sdaccel-model TARGET_SDACCEL_MODEL
                        target sdaccel model string
  --target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
                        target sdaccel system-lib
  --target-sdaccel-tag TARGET_SDACCEL_TAG
                        target sdaccel tag string
  --target-sdaccel-device TARGET_SDACCEL_DEVICE
                        target sdaccel device string
  --target-sdaccel-keys TARGET_SDACCEL_KEYS
                        target sdaccel keys options

target composite:
  --target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
                        target composite from_device
  --target-composite-libs TARGET_COMPOSITE_LIBS
                        target composite libs options
  --target-composite-devices TARGET_COMPOSITE_DEVICES
                        target composite devices options
  --target-composite-model TARGET_COMPOSITE_MODEL
                        target composite model string
  --target-composite-tag TARGET_COMPOSITE_TAG
                        target composite tag string
  --target-composite-device TARGET_COMPOSITE_DEVICE
                        target composite device string
  --target-composite-keys TARGET_COMPOSITE_KEYS
                        target composite keys options

target stackvm:
  --target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
                        target stackvm from_device
  --target-stackvm-libs TARGET_STACKVM_LIBS
                        target stackvm libs options
  --target-stackvm-model TARGET_STACKVM_MODEL
                        target stackvm model string
  --target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
                        target stackvm system-lib
  --target-stackvm-tag TARGET_STACKVM_TAG
                        target stackvm tag string
  --target-stackvm-device TARGET_STACKVM_DEVICE
                        target stackvm device string
  --target-stackvm-keys TARGET_STACKVM_KEYS
                        target stackvm keys options

target aocl_sw_emu:
  --target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
                        target aocl_sw_emu from_device
  --target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
                        target aocl_sw_emu libs options
  --target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
                        target aocl_sw_emu model string
  --target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
                        target aocl_sw_emu system-lib
  --target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
                        target aocl_sw_emu tag string
  --target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
                        target aocl_sw_emu device string
  --target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
                        target aocl_sw_emu keys options

target c:
  --target-c-unpacked-api TARGET_C_UNPACKED_API
                        target c unpacked-api
  --target-c-from_device TARGET_C_FROM_DEVICE
                        target c from_device
  --target-c-libs TARGET_C_LIBS
                        target c libs options
  --target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
                        target c constants-byte-alignment
  --target-c-executor TARGET_C_EXECUTOR
                        target c executor string
  --target-c-link-params TARGET_C_LINK_PARAMS
                        target c link-params
  --target-c-model TARGET_C_MODEL
                        target c model string
  --target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
                        target c workspace-byte-alignment
  --target-c-system-lib TARGET_C_SYSTEM_LIB
                        target c system-lib
  --target-c-tag TARGET_C_TAG
                        target c tag string
  --target-c-interface-api TARGET_C_INTERFACE_API
                        target c interface-api string
  --target-c-mcpu TARGET_C_MCPU
                        target c mcpu string
  --target-c-device TARGET_C_DEVICE
                        target c device string
  --target-c-runtime TARGET_C_RUNTIME
                        target c runtime string
  --target-c-keys TARGET_C_KEYS
                        target c keys options
  --target-c-march TARGET_C_MARCH
                        target c march string

target hexagon:
  --target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
                        target hexagon from_device
  --target-hexagon-libs TARGET_HEXAGON_LIBS
                        target hexagon libs options
  --target-hexagon-mattr TARGET_HEXAGON_MATTR
                        target hexagon mattr options
  --target-hexagon-model TARGET_HEXAGON_MODEL
                        target hexagon model string
  --target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
                        target hexagon llvm-options options
  --target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
                        target hexagon mtriple string
  --target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
                        target hexagon system-lib
  --target-hexagon-mcpu TARGET_HEXAGON_MCPU
                        target hexagon mcpu string
  --target-hexagon-device TARGET_HEXAGON_DEVICE
                        target hexagon device string
  --target-hexagon-tag TARGET_HEXAGON_TAG
                        target hexagon tag string
  --target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
                        target hexagon link-params
  --target-hexagon-keys TARGET_HEXAGON_KEYS
                        target hexagon keys options

executor graph:
  --executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS
                        Executor graph link-params

executor aot:
  --executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT
                        Executor aot workspace-byte-alignment
  --executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API
                        Executor aot unpacked-api
  --executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API
                        Executor aot interface-api string
  --executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS
                        Executor aot link-params

runtime cpp:
  --runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB
                        Runtime cpp system-lib

runtime crt:
  --runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB
                        Runtime crt system-lib

为 TVM 添加 ONNX 支持

TVM 依赖于你系统中的 ONNX python 库。你可以使用 pip3 install --user onnx onnxoptimizer 命令来安装 ONNX。如果你有 root 权限并且想全局安装 ONNX,你可以去掉 --user 选项。对 onnxoptimizer 的依赖是可选的,仅用于 onnx>=1.9

将 ONNX 模型编译到 TVM 运行时中#

一旦下载了 ResNet-50 模型,下一步就是对其进行编译。为了达到这个目的,将使用 tvmc compile。从编译过程中得到的输出是模型的 TAR 包,它被编译成目标平台的动态库。可以使用 TVM 运行时在目标设备上运行该模型。

# 这可能需要几分钟的时间,取决于你的机器
!python -m tvm.driver.tvmc compile --target "llvm" \
    --output resnet50-v2-7-tvm.tar \
        ../../_models/resnet50-v2-7.onnx
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.

查看 tvmc compile 在 module 中创建的文件:

%%bash
mkdir model
tar -xvf resnet50-v2-7-tvm.tar -C model
mod.so
mod.json
mod.params

列出了三个文件:

  • mod.so 是模型,表示为 C++ 库,可以被 TVM 运行时加载。

  • mod.json 是 TVM Relay 计算图的文本表示。

  • mod.params 是包含预训练模型参数的文件。

该 module 可以被你的应用程序直接加载,而 model 可以通过 TVM 运行时 API 运行。

定义正确的 target

指定正确的目标(选项 --target)可以对编译后的模块的性能产生巨大的影响,因为它可以利用目标上可用的硬件特性。

欲了解更多信息,请参考 为 x86 CPU 自动调优卷积网络。建议确定你运行的是哪种 CPU,以及可选的功能,并适当地设置目标。

用 TVMC 从编译的模块中运行模型#

已经将模型编译到模块,可以使用 TVM 运行时来进行预测。

TVMC 内置了 TVM 运行时,允许你运行编译的 TVM 模型。为了使用 TVMC 来运行模型并进行预测,需要两样东西:

  • 编译后的模块,我们刚刚生成出来。

  • 对模型的有效输入,以进行预测。

当涉及到预期的张量形状、格式和数据类型时,每个模型都很特别。出于这个原因,大多数模型需要一些预处理和后处理,以确保输入是有效的,并解释输出结果。TVMC 对输入和输出数据都采用了 NumPy 的 .npz 格式。这是得到良好支持的 NumPy 格式,可以将多个数组序列化为文件。

作为本教程的输入,将使用一只猫的图像,但你可以自由地用你选择的任何图像来代替这个图像。

输入预处理#

对于 ResNet-50 v2 模型,预期输入是 ImageNet 格式的。下面是为 ResNet-50 v2 预处理图像的脚本例子。

你将需要安装支持的 Python 图像库的版本。你可以使用 pip3 install --user pillow 来满足脚本的这个要求。

#!python ./preprocess.py
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# ONNX expects NCHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))

# Normalize according to ImageNet
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
      norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]

# Add batch dimension
img_data = np.expand_dims(norm_img_data, axis=0)

# Save to .npz (outputs imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)

运行已编译的模块#

有了模型和输入数据,现在可以运行 TVMC 来做预测:

!python -m tvm.driver.tvmc run \
    --inputs imagenet_cat.npz \
    --output predictions.npz \
    resnet50-v2-7-tvm.tar

回顾一下, .tar 模型文件包括 C++ 库,对 Relay 模型的描述,以及模型的参数。TVMC 包括 TVM 运行时,它可以加载模型并根据输入进行预测。当运行上述命令时,TVMC 会输出新文件,predictions.npz,其中包含 NumPy 格式的模型输出张量。

在这个例子中,在用于编译的同一台机器上运行该模型。在某些情况下,可能想通过 RPC Tracker 远程运行它。要阅读更多关于这些选项的信息,请查看:

!python -m tvm.driver.tvmc run --help
usage: tvmc run [-h] [--device {cpu,cuda,cl,metal,vulkan,rocm,micro}]
                [--fill-mode {zeros,ones,random}] [-i INPUTS] [-o OUTPUTS]
                [--print-time] [--print-top N] [--profile] [--end-to-end]
                [--repeat N] [--number N] [--rpc-key RPC_KEY]
                [--rpc-tracker RPC_TRACKER] [--list-options]
                PATH

positional arguments:
  PATH                  path to the compiled module file or to the project
                        directory if '--device micro' is selected.

optional arguments:
  -h, --help            show this help message and exit
  --device {cpu,cuda,cl,metal,vulkan,rocm,micro}
                        target device to run the compiled module. Defaults to
                        'cpu'
  --fill-mode {zeros,ones,random}
                        fill all input tensors with values. In case
                        --inputs/-i is provided, they will take precedence
                        over --fill-mode. Any remaining inputs will be filled
                        using the chosen fill mode. Defaults to 'random'
  -i INPUTS, --inputs INPUTS
                        path to the .npz input file
  -o OUTPUTS, --outputs OUTPUTS
                        path to the .npz output file
  --print-time          record and print the execution time(s). (non-micro
                        devices only)
  --print-top N         print the top n values and indices of the output
                        tensor
  --profile             generate profiling data from the runtime execution.
                        Using --profile requires the Graph Executor Debug
                        enabled on TVM. Profiling may also have an impact on
                        inference time, making it take longer to be generated.
                        (non-micro devices only)
  --end-to-end          Measure data transfers as well as model execution.
                        This can provide a more realistic performance
                        measurement in many cases.
  --repeat N            run the model n times. Defaults to '1'
  --number N            repeat the run n times. Defaults to '1'
  --rpc-key RPC_KEY     the RPC tracker key of the target device. (non-micro
                        devices only)
  --rpc-tracker RPC_TRACKER
                        hostname (required) and port (optional, defaults to
                        9090) of the RPC tracker, e.g. '192.168.0.100:9999'.
                        (non-micro devices only)
  --list-options        show all run options and option choices when '--device
                        micro' is selected. (micro devices only)

输出后处理#

如前所述,每个模型都会有自己的特定方式来提供输出张量。

需要运行一些后处理,利用为模型提供的查找表,将 ResNet-50 v2 的输出渲染成人类可读的形式。

下面的脚本显示了后处理的例子,从编译的模块的输出中提取标签。

运行这个脚本应该产生以下输出:

#!python ./postprocess.py
import os.path
import numpy as np

from scipy.special import softmax

from tvm.contrib.download import download_testdata

# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

output_file = "predictions.npz"

# Open the output and read the output tensor
if os.path.exists(output_file):
    with np.load(output_file) as data:
        scores = softmax(data["output_0"])
        scores = np.squeeze(scores)
        ranks = np.argsort(scores)[::-1]

        for rank in ranks[0:5]:
            print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

试着用其他图像替换猫的图像,看看 ResNet 模型会做出什么样的预测。

自动调优 ResNet 模型#

之前的模型是为了在 TVM 运行时工作而编译的,但不包括任何特定平台的优化。在本节中,将展示如何使用 TVMC 建立针对你工作平台的优化模型。

在某些情况下,当使用编译模块运行推理时,可能无法获得预期的性能。在这种情况下,可以利用自动调优器,为模型找到更好的配置,获得性能的提升。TVM 中的调优是指对模型进行优化以在给定目标上更快地运行的过程。这与训练或微调不同,因为它不影响模型的准确性,而只影响运行时的性能。作为调优过程的一部分,TVM 将尝试运行许多不同的运算器实现变体,以观察哪些算子表现最佳。这些运行的结果被存储在调优记录文件中,这最终是 tune 子命令的输出。

在最简单的形式下,调优要求你提供三样东西:

  • 你打算在这个模型上运行的设备的目标规格

  • 输出文件的路径,调优记录将被保存在该文件中

  • 最后是要调优的模型的路径。

默认搜索算法需要 xgboost,请参阅下面关于优化搜索算法的详细信息:

pip install xgboost cloudpickle

下面的例子展示了这一做法的实际效果:

!python -m tvm.driver.tvmc tune --target "llvm" \
    --output resnet50-v2-7-autotuner_records.json \
        ../../_models/resnet50-v2-7.onnx
/media/pc/data/4tb/lxw/anaconda3/envs/mx39/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
[Task  1/25]  Current/Best:  139.87/ 252.51 GFLOPS | Progress: (40/40) | 20.88 s Done.
[Task  2/25]  Current/Best:   42.44/ 183.76 GFLOPS | Progress: (40/40) | 11.12 s Done.
[Task  3/25]  Current/Best:  176.21/ 215.65 GFLOPS | Progress: (40/40) | 11.55 s Done.
[Task  4/25]  Current/Best:  113.94/ 160.83 GFLOPS | Progress: (40/40) | 13.36 s Done.
[Task  5/25]  Current/Best:  120.38/ 164.05 GFLOPS | Progress: (40/40) | 12.15 s Done.
[Task  6/25]  Current/Best:  103.44/ 188.69 GFLOPS | Progress: (40/40) | 12.60 s Done.
[Task  7/25]  Current/Best:  137.09/ 204.00 GFLOPS | Progress: (40/40) | 11.36 s Done.
[Task  8/25]  Current/Best:   99.24/ 195.34 GFLOPS | Progress: (40/40) | 18.87 s Done.
[Task  9/25]  Current/Best:   70.21/ 189.30 GFLOPS | Progress: (40/40) | 19.84 s Done.
[Task 10/25]  Current/Best:  139.57/ 150.27 GFLOPS | Progress: (40/40) | 11.81 s Done.
[Task 11/25]  Current/Best:  136.51/ 192.55 GFLOPS | Progress: (40/40) | 11.38 s Done.
[Task 12/25]  Current/Best:  127.62/ 216.62 GFLOPS | Progress: (40/40) | 15.05 s Done.
[Task 13/25]  Current/Best:   76.30/ 237.37 GFLOPS | Progress: (40/40) | 12.29 s Done.
[Task 14/25]  Current/Best:   67.69/ 197.50 GFLOPS | Progress: (40/40) | 17.04 s Done.
[Task 16/25]  Current/Best:   57.91/ 200.78 GFLOPS | Progress: (40/40) | 12.76 s Done.
[Task 17/25]  Current/Best:  172.88/ 267.60 GFLOPS | Progress: (40/40) | 12.21 s Done.
[Task 18/25]  Current/Best:  164.30/ 195.15 GFLOPS | Progress: (40/40) | 18.82 s Done.
[Task 19/25]  Current/Best:  122.30/ 209.99 GFLOPS | Progress: (40/40) | 14.50 s Done.
[Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/40) | 0.00 s s Done.
 Done.
 Done.
[Task 22/25]  Current/Best:   69.31/ 177.25 GFLOPS | Progress: (40/40) | 12.39 s Done.
[Task 23/25]  Current/Best:   92.92/ 185.29 GFLOPS | Progress: (40/40) | 13.99 s Done.
[Task 25/25]  Current/Best:   18.40/  84.62 GFLOPS | Progress: (40/40) | 20.26 s Done.
 Done.

在这个例子中,如果你为 --target 标志指出更具体的目标,你会看到更好的结果。

TVMC 将对模型的参数空间进行搜索,尝试不同的运算符配置,并选择在你的平台上运行最快的一个。尽管这是基于 CPU 和模型操作的指导性搜索,但仍可能需要几个小时来完成搜索。这个搜索的输出将被保存到 resnet50-v2-7-autotuner_records.json 文件中,以后将被用来编译优化的模型。

定义调优搜索算法

默认情况下,这种搜索是使用 XGBoost Grid 算法引导的。根据你的模型的复杂性和可利用的时间,你可能想选择不同的算法。完整的列表可以通过查阅:

!python -m tvm.driver.tvmc tune --help
usage: tvmc tune [-h] [--early-stopping EARLY_STOPPING]
                 [--min-repeat-ms MIN_REPEAT_MS]
                 [--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
                 [--number NUMBER] -o OUTPUT [--parallel PARALLEL]
                 [--repeat REPEAT] [--rpc-key RPC_KEY]
                 [--rpc-tracker RPC_TRACKER] [--target TARGET]
                 [--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
                 [--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
                 [--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
                 [--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
                 [--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
                 [--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
                 [--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
                 [--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
                 [--target-ext_dev-model TARGET_EXT_DEV_MODEL]
                 [--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
                 [--target-ext_dev-tag TARGET_EXT_DEV_TAG]
                 [--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
                 [--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
                 [--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
                 [--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
                 [--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
                 [--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
                 [--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
                 [--target-llvm-mattr TARGET_LLVM_MATTR]
                 [--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
                 [--target-llvm-libs TARGET_LLVM_LIBS]
                 [--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
                 [--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
                 [--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
                 [--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
                 [--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
                 [--target-llvm-tag TARGET_LLVM_TAG]
                 [--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
                 [--target-llvm-model TARGET_LLVM_MODEL]
                 [--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
                 [--target-llvm-mcpu TARGET_LLVM_MCPU]
                 [--target-llvm-device TARGET_LLVM_DEVICE]
                 [--target-llvm-runtime TARGET_LLVM_RUNTIME]
                 [--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
                 [--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
                 [--target-llvm-mabi TARGET_LLVM_MABI]
                 [--target-llvm-keys TARGET_LLVM_KEYS]
                 [--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
                 [--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
                 [--target-hybrid-libs TARGET_HYBRID_LIBS]
                 [--target-hybrid-model TARGET_HYBRID_MODEL]
                 [--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
                 [--target-hybrid-tag TARGET_HYBRID_TAG]
                 [--target-hybrid-device TARGET_HYBRID_DEVICE]
                 [--target-hybrid-keys TARGET_HYBRID_KEYS]
                 [--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
                 [--target-aocl-libs TARGET_AOCL_LIBS]
                 [--target-aocl-model TARGET_AOCL_MODEL]
                 [--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
                 [--target-aocl-tag TARGET_AOCL_TAG]
                 [--target-aocl-device TARGET_AOCL_DEVICE]
                 [--target-aocl-keys TARGET_AOCL_KEYS]
                 [--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
                 [--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
                 [--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
                 [--target-nvptx-libs TARGET_NVPTX_LIBS]
                 [--target-nvptx-model TARGET_NVPTX_MODEL]
                 [--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
                 [--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
                 [--target-nvptx-tag TARGET_NVPTX_TAG]
                 [--target-nvptx-mcpu TARGET_NVPTX_MCPU]
                 [--target-nvptx-device TARGET_NVPTX_DEVICE]
                 [--target-nvptx-keys TARGET_NVPTX_KEYS]
                 [--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
                 [--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
                 [--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
                 [--target-opencl-libs TARGET_OPENCL_LIBS]
                 [--target-opencl-model TARGET_OPENCL_MODEL]
                 [--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
                 [--target-opencl-tag TARGET_OPENCL_TAG]
                 [--target-opencl-device TARGET_OPENCL_DEVICE]
                 [--target-opencl-keys TARGET_OPENCL_KEYS]
                 [--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
                 [--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
                 [--target-metal-from_device TARGET_METAL_FROM_DEVICE]
                 [--target-metal-libs TARGET_METAL_LIBS]
                 [--target-metal-keys TARGET_METAL_KEYS]
                 [--target-metal-model TARGET_METAL_MODEL]
                 [--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
                 [--target-metal-tag TARGET_METAL_TAG]
                 [--target-metal-device TARGET_METAL_DEVICE]
                 [--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
                 [--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
                 [--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
                 [--target-webgpu-libs TARGET_WEBGPU_LIBS]
                 [--target-webgpu-model TARGET_WEBGPU_MODEL]
                 [--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
                 [--target-webgpu-tag TARGET_WEBGPU_TAG]
                 [--target-webgpu-device TARGET_WEBGPU_DEVICE]
                 [--target-webgpu-keys TARGET_WEBGPU_KEYS]
                 [--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
                 [--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
                 [--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
                 [--target-rocm-libs TARGET_ROCM_LIBS]
                 [--target-rocm-mattr TARGET_ROCM_MATTR]
                 [--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK]
                 [--target-rocm-model TARGET_ROCM_MODEL]
                 [--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
                 [--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
                 [--target-rocm-tag TARGET_ROCM_TAG]
                 [--target-rocm-device TARGET_ROCM_DEVICE]
                 [--target-rocm-mcpu TARGET_ROCM_MCPU]
                 [--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK]
                 [--target-rocm-keys TARGET_ROCM_KEYS]
                 [--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
                 [--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
                 [--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
                 [--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
                 [--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
                 [--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
                 [--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
                 [--target-vulkan-libs TARGET_VULKAN_LIBS]
                 [--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
                 [--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
                 [--target-vulkan-mattr TARGET_VULKAN_MATTR]
                 [--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
                 [--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
                 [--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
                 [--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
                 [--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
                 [--target-vulkan-model TARGET_VULKAN_MODEL]
                 [--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
                 [--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
                 [--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
                 [--target-vulkan-tag TARGET_VULKAN_TAG]
                 [--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
                 [--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
                 [--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
                 [--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
                 [--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
                 [--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
                 [--target-vulkan-device TARGET_VULKAN_DEVICE]
                 [--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK]
                 [--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
                 [--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
                 [--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT]
                 [--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
                 [--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
                 [--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
                 [--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
                 [--target-vulkan-keys TARGET_VULKAN_KEYS]
                 [--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
                 [--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
                 [--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
                 [--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
                 [--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
                 [--target-cuda-arch TARGET_CUDA_ARCH]
                 [--target-cuda-libs TARGET_CUDA_LIBS]
                 [--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK]
                 [--target-cuda-model TARGET_CUDA_MODEL]
                 [--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
                 [--target-cuda-tag TARGET_CUDA_TAG]
                 [--target-cuda-device TARGET_CUDA_DEVICE]
                 [--target-cuda-mcpu TARGET_CUDA_MCPU]
                 [--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
                 [--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
                 [--target-cuda-keys TARGET_CUDA_KEYS]
                 [--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
                 [--target-sdaccel-libs TARGET_SDACCEL_LIBS]
                 [--target-sdaccel-model TARGET_SDACCEL_MODEL]
                 [--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
                 [--target-sdaccel-tag TARGET_SDACCEL_TAG]
                 [--target-sdaccel-device TARGET_SDACCEL_DEVICE]
                 [--target-sdaccel-keys TARGET_SDACCEL_KEYS]
                 [--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
                 [--target-composite-libs TARGET_COMPOSITE_LIBS]
                 [--target-composite-devices TARGET_COMPOSITE_DEVICES]
                 [--target-composite-model TARGET_COMPOSITE_MODEL]
                 [--target-composite-tag TARGET_COMPOSITE_TAG]
                 [--target-composite-device TARGET_COMPOSITE_DEVICE]
                 [--target-composite-keys TARGET_COMPOSITE_KEYS]
                 [--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
                 [--target-stackvm-libs TARGET_STACKVM_LIBS]
                 [--target-stackvm-model TARGET_STACKVM_MODEL]
                 [--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
                 [--target-stackvm-tag TARGET_STACKVM_TAG]
                 [--target-stackvm-device TARGET_STACKVM_DEVICE]
                 [--target-stackvm-keys TARGET_STACKVM_KEYS]
                 [--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
                 [--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
                 [--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
                 [--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
                 [--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
                 [--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
                 [--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
                 [--target-c-unpacked-api TARGET_C_UNPACKED_API]
                 [--target-c-from_device TARGET_C_FROM_DEVICE]
                 [--target-c-libs TARGET_C_LIBS]
                 [--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
                 [--target-c-executor TARGET_C_EXECUTOR]
                 [--target-c-link-params TARGET_C_LINK_PARAMS]
                 [--target-c-model TARGET_C_MODEL]
                 [--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
                 [--target-c-system-lib TARGET_C_SYSTEM_LIB]
                 [--target-c-tag TARGET_C_TAG]
                 [--target-c-interface-api TARGET_C_INTERFACE_API]
                 [--target-c-mcpu TARGET_C_MCPU]
                 [--target-c-device TARGET_C_DEVICE]
                 [--target-c-runtime TARGET_C_RUNTIME]
                 [--target-c-keys TARGET_C_KEYS]
                 [--target-c-march TARGET_C_MARCH]
                 [--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
                 [--target-hexagon-libs TARGET_HEXAGON_LIBS]
                 [--target-hexagon-mattr TARGET_HEXAGON_MATTR]
                 [--target-hexagon-model TARGET_HEXAGON_MODEL]
                 [--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
                 [--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
                 [--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
                 [--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
                 [--target-hexagon-device TARGET_HEXAGON_DEVICE]
                 [--target-hexagon-tag TARGET_HEXAGON_TAG]
                 [--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
                 [--target-hexagon-keys TARGET_HEXAGON_KEYS]
                 [--target-host TARGET_HOST] [--timeout TIMEOUT]
                 [--trials TRIALS] [--tuning-records PATH]
                 [--desired-layout {NCHW,NHWC}] [--enable-autoscheduler]
                 [--cache-line-bytes CACHE_LINE_BYTES] [--num-cores NUM_CORES]
                 [--vector-unit-bytes VECTOR_UNIT_BYTES]
                 [--max-shared-memory-per-block MAX_SHARED_MEMORY_PER_BLOCK]
                 [--max-local-memory-per-block MAX_LOCAL_MEMORY_PER_BLOCK]
                 [--max-threads-per-block MAX_THREADS_PER_BLOCK]
                 [--max-vthread-extent MAX_VTHREAD_EXTENT]
                 [--warp-size WARP_SIZE] [--include-simple-tasks]
                 [--log-estimated-latency]
                 [--tuner {ga,gridsearch,random,xgb,xgb_knob,xgb-rank}]
                 [--input-shapes INPUT_SHAPES]
                 FILE

positional arguments:
  FILE                  path to the input model file

optional arguments:
  -h, --help            show this help message and exit
  --early-stopping EARLY_STOPPING
                        minimum number of trials before early stopping
  --min-repeat-ms MIN_REPEAT_MS
                        minimum time to run each trial, in milliseconds.
                        Defaults to 0 on x86 and 1000 on all other targets
  --model-format {keras,onnx,pb,tflite,pytorch,paddle}
                        specify input model format
  --number NUMBER       number of runs a single repeat is made of. The final
                        number of tuning executions is: (1 + number * repeat)
  -o OUTPUT, --output OUTPUT
                        output file to store the tuning records for the tuning
                        process
  --parallel PARALLEL   the maximum number of parallel devices to use when
                        tuning
  --repeat REPEAT       how many times to repeat each measurement
  --rpc-key RPC_KEY     the RPC tracker key of the target device. Required
                        when --rpc-tracker is provided.
  --rpc-tracker RPC_TRACKER
                        hostname (required) and port (optional, defaults to
                        9090) of the RPC tracker, e.g. '192.168.0.100:9999'
  --target TARGET       compilation target as plain string, inline JSON or
                        path to a JSON file
  --target-host TARGET_HOST
                        the host compilation target, defaults to None
  --timeout TIMEOUT     compilation timeout, in seconds
  --trials TRIALS       the maximum number of tuning trials to perform
  --tuning-records PATH
                        path to an auto-tuning log file by AutoTVM.
  --desired-layout {NCHW,NHWC}
                        change the data layout of the whole graph
  --enable-autoscheduler
                        enable tuning the graph through the AutoScheduler
                        tuner
  --input-shapes INPUT_SHAPES
                        specify non-generic shapes for model to run, format is
                        "input_name:[dim1,dim2,...,dimn]
                        input_name2:[dim1,dim2]"

target example_target_hook:
  --target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
                        target example_target_hook from_device
  --target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
                        target example_target_hook libs options
  --target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
                        target example_target_hook model string
  --target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
                        target example_target_hook tag string
  --target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
                        target example_target_hook device string
  --target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
                        target example_target_hook keys options

target ext_dev:
  --target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
                        target ext_dev from_device
  --target-ext_dev-libs TARGET_EXT_DEV_LIBS
                        target ext_dev libs options
  --target-ext_dev-model TARGET_EXT_DEV_MODEL
                        target ext_dev model string
  --target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
                        target ext_dev system-lib
  --target-ext_dev-tag TARGET_EXT_DEV_TAG
                        target ext_dev tag string
  --target-ext_dev-device TARGET_EXT_DEV_DEVICE
                        target ext_dev device string
  --target-ext_dev-keys TARGET_EXT_DEV_KEYS
                        target ext_dev keys options

target llvm:
  --target-llvm-fast-math TARGET_LLVM_FAST_MATH
                        target llvm fast-math
  --target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
                        target llvm opt-level
  --target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
                        target llvm unpacked-api
  --target-llvm-from_device TARGET_LLVM_FROM_DEVICE
                        target llvm from_device
  --target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
                        target llvm fast-math-ninf
  --target-llvm-mattr TARGET_LLVM_MATTR
                        target llvm mattr options
  --target-llvm-num-cores TARGET_LLVM_NUM_CORES
                        target llvm num-cores
  --target-llvm-libs TARGET_LLVM_LIBS
                        target llvm libs options
  --target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
                        target llvm fast-math-nsz
  --target-llvm-link-params TARGET_LLVM_LINK_PARAMS
                        target llvm link-params
  --target-llvm-interface-api TARGET_LLVM_INTERFACE_API
                        target llvm interface-api string
  --target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
                        target llvm fast-math-contract
  --target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
                        target llvm system-lib
  --target-llvm-tag TARGET_LLVM_TAG
                        target llvm tag string
  --target-llvm-mtriple TARGET_LLVM_MTRIPLE
                        target llvm mtriple string
  --target-llvm-model TARGET_LLVM_MODEL
                        target llvm model string
  --target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
                        target llvm mfloat-abi string
  --target-llvm-mcpu TARGET_LLVM_MCPU
                        target llvm mcpu string
  --target-llvm-device TARGET_LLVM_DEVICE
                        target llvm device string
  --target-llvm-runtime TARGET_LLVM_RUNTIME
                        target llvm runtime string
  --target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
                        target llvm fast-math-arcp
  --target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
                        target llvm fast-math-reassoc
  --target-llvm-mabi TARGET_LLVM_MABI
                        target llvm mabi string
  --target-llvm-keys TARGET_LLVM_KEYS
                        target llvm keys options
  --target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
                        target llvm fast-math-nnan

target hybrid:
  --target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
                        target hybrid from_device
  --target-hybrid-libs TARGET_HYBRID_LIBS
                        target hybrid libs options
  --target-hybrid-model TARGET_HYBRID_MODEL
                        target hybrid model string
  --target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
                        target hybrid system-lib
  --target-hybrid-tag TARGET_HYBRID_TAG
                        target hybrid tag string
  --target-hybrid-device TARGET_HYBRID_DEVICE
                        target hybrid device string
  --target-hybrid-keys TARGET_HYBRID_KEYS
                        target hybrid keys options

target aocl:
  --target-aocl-from_device TARGET_AOCL_FROM_DEVICE
                        target aocl from_device
  --target-aocl-libs TARGET_AOCL_LIBS
                        target aocl libs options
  --target-aocl-model TARGET_AOCL_MODEL
                        target aocl model string
  --target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
                        target aocl system-lib
  --target-aocl-tag TARGET_AOCL_TAG
                        target aocl tag string
  --target-aocl-device TARGET_AOCL_DEVICE
                        target aocl device string
  --target-aocl-keys TARGET_AOCL_KEYS
                        target aocl keys options

target nvptx:
  --target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
                        target nvptx max_num_threads
  --target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
                        target nvptx thread_warp_size
  --target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
                        target nvptx from_device
  --target-nvptx-libs TARGET_NVPTX_LIBS
                        target nvptx libs options
  --target-nvptx-model TARGET_NVPTX_MODEL
                        target nvptx model string
  --target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
                        target nvptx system-lib
  --target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
                        target nvptx mtriple string
  --target-nvptx-tag TARGET_NVPTX_TAG
                        target nvptx tag string
  --target-nvptx-mcpu TARGET_NVPTX_MCPU
                        target nvptx mcpu string
  --target-nvptx-device TARGET_NVPTX_DEVICE
                        target nvptx device string
  --target-nvptx-keys TARGET_NVPTX_KEYS
                        target nvptx keys options

target opencl:
  --target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
                        target opencl max_num_threads
  --target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
                        target opencl thread_warp_size
  --target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
                        target opencl from_device
  --target-opencl-libs TARGET_OPENCL_LIBS
                        target opencl libs options
  --target-opencl-model TARGET_OPENCL_MODEL
                        target opencl model string
  --target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
                        target opencl system-lib
  --target-opencl-tag TARGET_OPENCL_TAG
                        target opencl tag string
  --target-opencl-device TARGET_OPENCL_DEVICE
                        target opencl device string
  --target-opencl-keys TARGET_OPENCL_KEYS
                        target opencl keys options

target metal:
  --target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
                        target metal max_num_threads
  --target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
                        target metal thread_warp_size
  --target-metal-from_device TARGET_METAL_FROM_DEVICE
                        target metal from_device
  --target-metal-libs TARGET_METAL_LIBS
                        target metal libs options
  --target-metal-keys TARGET_METAL_KEYS
                        target metal keys options
  --target-metal-model TARGET_METAL_MODEL
                        target metal model string
  --target-metal-system-lib TARGET_METAL_SYSTEM_LIB
                        target metal system-lib
  --target-metal-tag TARGET_METAL_TAG
                        target metal tag string
  --target-metal-device TARGET_METAL_DEVICE
                        target metal device string
  --target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
                        target metal max_function_args

target webgpu:
  --target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
                        target webgpu max_num_threads
  --target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
                        target webgpu from_device
  --target-webgpu-libs TARGET_WEBGPU_LIBS
                        target webgpu libs options
  --target-webgpu-model TARGET_WEBGPU_MODEL
                        target webgpu model string
  --target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
                        target webgpu system-lib
  --target-webgpu-tag TARGET_WEBGPU_TAG
                        target webgpu tag string
  --target-webgpu-device TARGET_WEBGPU_DEVICE
                        target webgpu device string
  --target-webgpu-keys TARGET_WEBGPU_KEYS
                        target webgpu keys options

target rocm:
  --target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
                        target rocm max_num_threads
  --target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
                        target rocm thread_warp_size
  --target-rocm-from_device TARGET_ROCM_FROM_DEVICE
                        target rocm from_device
  --target-rocm-libs TARGET_ROCM_LIBS
                        target rocm libs options
  --target-rocm-mattr TARGET_ROCM_MATTR
                        target rocm mattr options
  --target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK
                        target rocm max_shared_memory_per_block
  --target-rocm-model TARGET_ROCM_MODEL
                        target rocm model string
  --target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
                        target rocm system-lib
  --target-rocm-mtriple TARGET_ROCM_MTRIPLE
                        target rocm mtriple string
  --target-rocm-tag TARGET_ROCM_TAG
                        target rocm tag string
  --target-rocm-device TARGET_ROCM_DEVICE
                        target rocm device string
  --target-rocm-mcpu TARGET_ROCM_MCPU
                        target rocm mcpu string
  --target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK
                        target rocm max_threads_per_block
  --target-rocm-keys TARGET_ROCM_KEYS
                        target rocm keys options

target vulkan:
  --target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
                        target vulkan max_num_threads
  --target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
                        target vulkan thread_warp_size
  --target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
                        target vulkan from_device
  --target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
                        target vulkan max_per_stage_descriptor_storage_buffer
  --target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
                        target vulkan driver_version
  --target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
                        target vulkan supports_16bit_buffer
  --target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
                        target vulkan max_block_size_z
  --target-vulkan-libs TARGET_VULKAN_LIBS
                        target vulkan libs options
  --target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
                        target vulkan supports_dedicated_allocation
  --target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
                        target vulkan supported_subgroup_operations
  --target-vulkan-mattr TARGET_VULKAN_MATTR
                        target vulkan mattr options
  --target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
                        target vulkan max_storage_buffer_range
  --target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
                        target vulkan max_push_constants_size
  --target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
                        target vulkan supports_push_descriptor
  --target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
                        target vulkan supports_int64
  --target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
                        target vulkan supports_float32
  --target-vulkan-model TARGET_VULKAN_MODEL
                        target vulkan model string
  --target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
                        target vulkan max_block_size_x
  --target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
                        target vulkan system-lib
  --target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
                        target vulkan max_block_size_y
  --target-vulkan-tag TARGET_VULKAN_TAG
                        target vulkan tag string
  --target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
                        target vulkan supports_int8
  --target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
                        target vulkan max_spirv_version
  --target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
                        target vulkan vulkan_api_version
  --target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
                        target vulkan supports_8bit_buffer
  --target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
                        target vulkan device_type string
  --target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
                        target vulkan supports_int32
  --target-vulkan-device TARGET_VULKAN_DEVICE
                        target vulkan device string
  --target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK
                        target vulkan max_threads_per_block
  --target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
                        target vulkan max_uniform_buffer_range
  --target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
                        target vulkan driver_name string
  --target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT
                        target vulkan supports_integer_dot_product
  --target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
                        target vulkan supports_storage_buffer_storage_class
  --target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
                        target vulkan supports_float16
  --target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
                        target vulkan device_name string
  --target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
                        target vulkan supports_float64
  --target-vulkan-keys TARGET_VULKAN_KEYS
                        target vulkan keys options
  --target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
                        target vulkan max_shared_memory_per_block
  --target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
                        target vulkan supports_int16

target cuda:
  --target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
                        target cuda max_num_threads
  --target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
                        target cuda thread_warp_size
  --target-cuda-from_device TARGET_CUDA_FROM_DEVICE
                        target cuda from_device
  --target-cuda-arch TARGET_CUDA_ARCH
                        target cuda arch string
  --target-cuda-libs TARGET_CUDA_LIBS
                        target cuda libs options
  --target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK
                        target cuda max_shared_memory_per_block
  --target-cuda-model TARGET_CUDA_MODEL
                        target cuda model string
  --target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
                        target cuda system-lib
  --target-cuda-tag TARGET_CUDA_TAG
                        target cuda tag string
  --target-cuda-device TARGET_CUDA_DEVICE
                        target cuda device string
  --target-cuda-mcpu TARGET_CUDA_MCPU
                        target cuda mcpu string
  --target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
                        target cuda max_threads_per_block
  --target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
                        target cuda registers_per_block
  --target-cuda-keys TARGET_CUDA_KEYS
                        target cuda keys options

target sdaccel:
  --target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
                        target sdaccel from_device
  --target-sdaccel-libs TARGET_SDACCEL_LIBS
                        target sdaccel libs options
  --target-sdaccel-model TARGET_SDACCEL_MODEL
                        target sdaccel model string
  --target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
                        target sdaccel system-lib
  --target-sdaccel-tag TARGET_SDACCEL_TAG
                        target sdaccel tag string
  --target-sdaccel-device TARGET_SDACCEL_DEVICE
                        target sdaccel device string
  --target-sdaccel-keys TARGET_SDACCEL_KEYS
                        target sdaccel keys options

target composite:
  --target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
                        target composite from_device
  --target-composite-libs TARGET_COMPOSITE_LIBS
                        target composite libs options
  --target-composite-devices TARGET_COMPOSITE_DEVICES
                        target composite devices options
  --target-composite-model TARGET_COMPOSITE_MODEL
                        target composite model string
  --target-composite-tag TARGET_COMPOSITE_TAG
                        target composite tag string
  --target-composite-device TARGET_COMPOSITE_DEVICE
                        target composite device string
  --target-composite-keys TARGET_COMPOSITE_KEYS
                        target composite keys options

target stackvm:
  --target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
                        target stackvm from_device
  --target-stackvm-libs TARGET_STACKVM_LIBS
                        target stackvm libs options
  --target-stackvm-model TARGET_STACKVM_MODEL
                        target stackvm model string
  --target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
                        target stackvm system-lib
  --target-stackvm-tag TARGET_STACKVM_TAG
                        target stackvm tag string
  --target-stackvm-device TARGET_STACKVM_DEVICE
                        target stackvm device string
  --target-stackvm-keys TARGET_STACKVM_KEYS
                        target stackvm keys options

target aocl_sw_emu:
  --target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
                        target aocl_sw_emu from_device
  --target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
                        target aocl_sw_emu libs options
  --target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
                        target aocl_sw_emu model string
  --target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
                        target aocl_sw_emu system-lib
  --target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
                        target aocl_sw_emu tag string
  --target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
                        target aocl_sw_emu device string
  --target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
                        target aocl_sw_emu keys options

target c:
  --target-c-unpacked-api TARGET_C_UNPACKED_API
                        target c unpacked-api
  --target-c-from_device TARGET_C_FROM_DEVICE
                        target c from_device
  --target-c-libs TARGET_C_LIBS
                        target c libs options
  --target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
                        target c constants-byte-alignment
  --target-c-executor TARGET_C_EXECUTOR
                        target c executor string
  --target-c-link-params TARGET_C_LINK_PARAMS
                        target c link-params
  --target-c-model TARGET_C_MODEL
                        target c model string
  --target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
                        target c workspace-byte-alignment
  --target-c-system-lib TARGET_C_SYSTEM_LIB
                        target c system-lib
  --target-c-tag TARGET_C_TAG
                        target c tag string
  --target-c-interface-api TARGET_C_INTERFACE_API
                        target c interface-api string
  --target-c-mcpu TARGET_C_MCPU
                        target c mcpu string
  --target-c-device TARGET_C_DEVICE
                        target c device string
  --target-c-runtime TARGET_C_RUNTIME
                        target c runtime string
  --target-c-keys TARGET_C_KEYS
                        target c keys options
  --target-c-march TARGET_C_MARCH
                        target c march string

target hexagon:
  --target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
                        target hexagon from_device
  --target-hexagon-libs TARGET_HEXAGON_LIBS
                        target hexagon libs options
  --target-hexagon-mattr TARGET_HEXAGON_MATTR
                        target hexagon mattr options
  --target-hexagon-model TARGET_HEXAGON_MODEL
                        target hexagon model string
  --target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
                        target hexagon llvm-options options
  --target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
                        target hexagon mtriple string
  --target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
                        target hexagon system-lib
  --target-hexagon-mcpu TARGET_HEXAGON_MCPU
                        target hexagon mcpu string
  --target-hexagon-device TARGET_HEXAGON_DEVICE
                        target hexagon device string
  --target-hexagon-tag TARGET_HEXAGON_TAG
                        target hexagon tag string
  --target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
                        target hexagon link-params
  --target-hexagon-keys TARGET_HEXAGON_KEYS
                        target hexagon keys options

AutoScheduler options:
  AutoScheduler options, used when --enable-autoscheduler is provided

  --cache-line-bytes CACHE_LINE_BYTES
                        the size of cache line in bytes. If not specified, it
                        will be autoset for the current machine.
  --num-cores NUM_CORES
                        the number of device cores. If not specified, it will
                        be autoset for the current machine.
  --vector-unit-bytes VECTOR_UNIT_BYTES
                        the width of vector units in bytes. If not specified,
                        it will be autoset for the current machine.
  --max-shared-memory-per-block MAX_SHARED_MEMORY_PER_BLOCK
                        the max shared memory per block in bytes. If not
                        specified, it will be autoset for the current machine.
  --max-local-memory-per-block MAX_LOCAL_MEMORY_PER_BLOCK
                        the max local memory per block in bytes. If not
                        specified, it will be autoset for the current machine.
  --max-threads-per-block MAX_THREADS_PER_BLOCK
                        the max number of threads per block. If not specified,
                        it will be autoset for the current machine.
  --max-vthread-extent MAX_VTHREAD_EXTENT
                        the max vthread extent. If not specified, it will be
                        autoset for the current machine.
  --warp-size WARP_SIZE
                        the thread numbers of a warp. If not specified, it
                        will be autoset for the current machine.
  --include-simple-tasks
                        whether to extract simple tasks that do not include
                        complicated ops
  --log-estimated-latency
                        whether to log the estimated latency to the file after
                        tuning a task

AutoTVM options:
  AutoTVM options, used when the AutoScheduler is not enabled

  --tuner {ga,gridsearch,random,xgb,xgb_knob,xgb-rank}
                        type of tuner to use when tuning with autotvm.

对于消费级 Skylake CPU 来说,输出结果将是这样的:

!python -m tvm.driver.tvmc tune \
    --target "llvm -mcpu=broadwell" \
        --output resnet50-v2-7-autotuner_records.json \
            ../../_models/resnet50-v2-7.onnx
/media/pc/data/4tb/lxw/anaconda3/envs/mx39/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
[Task  1/25]  Current/Best:  135.54/ 444.49 GFLOPS | Progress: (40/40) | 16.09 s Done.
[Task  2/25]  Current/Best:   91.39/ 426.70 GFLOPS | Progress: (40/40) | 10.33 s Done.
[Task  3/25]  Current/Best:  147.25/ 516.21 GFLOPS | Progress: (40/40) | 11.55 s Done.
[Task  4/25]  Current/Best:  561.81/ 561.81 GFLOPS | Progress: (40/40) | 12.99 s Done.
[Task  5/25]  Current/Best:  182.70/ 570.25 GFLOPS | Progress: (40/40) | 11.12 s Done.
[Task  6/25]  Current/Best:   79.82/ 459.29 GFLOPS | Progress: (40/40) | 12.03 s Done.
[Task  7/25]  Current/Best:  152.79/ 300.64 GFLOPS | Progress: (40/40) | 11.16 s Done.
[Task  8/25]  Current/Best:  155.29/ 310.77 GFLOPS | Progress: (40/40) | 14.68 s Done.
[Task  9/25]  Current/Best:  126.56/ 561.24 GFLOPS | Progress: (40/40) | 13.93 s Done.
[Task 10/25]  Current/Best:   41.68/ 517.18 GFLOPS | Progress: (40/40) | 10.91 s Done.
[Task 11/25]  Current/Best:  311.13/ 528.67 GFLOPS | Progress: (40/40) | 10.89 s Done.
[Task 12/25]  Current/Best:  265.13/ 525.74 GFLOPS | Progress: (40/40) | 11.19 s Done.
[Task 13/25]  Current/Best:  107.09/ 426.10 GFLOPS | Progress: (40/40) | 11.29 s Done.
[Task 14/25]  Current/Best:  119.32/ 373.60 GFLOPS | Progress: (40/40) | 12.38 s Done.
[Task 15/25]  Current/Best:  101.58/ 439.72 GFLOPS | Progress: (40/40) | 14.41 s Done.
[Task 16/25]  Current/Best:  177.78/ 427.98 GFLOPS | Progress: (40/40) | 10.23 s Done.
[Task 17/25]  Current/Best:   72.04/ 349.15 GFLOPS | Progress: (40/40) | 11.50 s Done.
[Task 18/25]  Current/Best:  124.41/ 500.93 GFLOPS | Progress: (40/40) | 12.07 s Done.
[Task 19/25]  Current/Best:  243.37/ 371.27 GFLOPS | Progress: (40/40) | 12.88 s Done.
[Task 20/25]  Current/Best:  137.63/ 343.57 GFLOPS | Progress: (40/40) | 21.29 s Done.
[Task 21/25]  Current/Best:   59.02/ 330.98 GFLOPS | Progress: (40/40) | 12.88 s Done.
[Task 22/25]  Current/Best:  273.71/ 457.41 GFLOPS | Progress: (40/40) | 11.04 s Done.
[Task 23/25]  Current/Best:  166.89/ 430.39 GFLOPS | Progress: (40/40) | 13.46 s Done.
[Task 25/25]  Current/Best:   28.01/  59.42 GFLOPS | Progress: (40/40) | 20.24 s Done.
 Done.

调谐会话可能需要很长的时间,所以 tvmc tune 提供了许多选项来定制你的调谐过程,在重复次数方面(例如 --repeat--number),要使用的调谐算法等等。

用调优数据编译优化后的模型#

作为上述调谐过程的输出,获得了存储在 resnet50-v2-7-autotuner_records.json 的调谐记录。这个文件可以有两种使用方式:

  • 作为进一步调谐的输入(通过 tvmc tune --tuning-records)。

  • 作为对编译器的输入

编译器将使用这些结果来为你指定的目标上的模型生成高性能代码。要做到这一点,可以使用 tvmc compile --tuning-records

获得更多信息:

!python -m tvm.driver.tvmc compile --help
usage: tvmc compile [-h] [--cross-compiler CROSS_COMPILER]
                    [--cross-compiler-options CROSS_COMPILER_OPTIONS]
                    [--desired-layout {NCHW,NHWC}] [--dump-code FORMAT]
                    [--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
                    [-o OUTPUT] [-f {so,mlf}] [--pass-config name=value]
                    [--target TARGET]
                    [--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
                    [--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
                    [--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
                    [--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
                    [--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
                    [--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
                    [--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
                    [--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
                    [--target-ext_dev-model TARGET_EXT_DEV_MODEL]
                    [--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
                    [--target-ext_dev-tag TARGET_EXT_DEV_TAG]
                    [--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
                    [--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
                    [--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
                    [--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
                    [--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
                    [--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
                    [--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
                    [--target-llvm-mattr TARGET_LLVM_MATTR]
                    [--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
                    [--target-llvm-libs TARGET_LLVM_LIBS]
                    [--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
                    [--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
                    [--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
                    [--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
                    [--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
                    [--target-llvm-tag TARGET_LLVM_TAG]
                    [--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
                    [--target-llvm-model TARGET_LLVM_MODEL]
                    [--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
                    [--target-llvm-mcpu TARGET_LLVM_MCPU]
                    [--target-llvm-device TARGET_LLVM_DEVICE]
                    [--target-llvm-runtime TARGET_LLVM_RUNTIME]
                    [--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
                    [--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
                    [--target-llvm-mabi TARGET_LLVM_MABI]
                    [--target-llvm-keys TARGET_LLVM_KEYS]
                    [--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
                    [--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
                    [--target-hybrid-libs TARGET_HYBRID_LIBS]
                    [--target-hybrid-model TARGET_HYBRID_MODEL]
                    [--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
                    [--target-hybrid-tag TARGET_HYBRID_TAG]
                    [--target-hybrid-device TARGET_HYBRID_DEVICE]
                    [--target-hybrid-keys TARGET_HYBRID_KEYS]
                    [--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
                    [--target-aocl-libs TARGET_AOCL_LIBS]
                    [--target-aocl-model TARGET_AOCL_MODEL]
                    [--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
                    [--target-aocl-tag TARGET_AOCL_TAG]
                    [--target-aocl-device TARGET_AOCL_DEVICE]
                    [--target-aocl-keys TARGET_AOCL_KEYS]
                    [--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
                    [--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
                    [--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
                    [--target-nvptx-libs TARGET_NVPTX_LIBS]
                    [--target-nvptx-model TARGET_NVPTX_MODEL]
                    [--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
                    [--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
                    [--target-nvptx-tag TARGET_NVPTX_TAG]
                    [--target-nvptx-mcpu TARGET_NVPTX_MCPU]
                    [--target-nvptx-device TARGET_NVPTX_DEVICE]
                    [--target-nvptx-keys TARGET_NVPTX_KEYS]
                    [--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
                    [--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
                    [--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
                    [--target-opencl-libs TARGET_OPENCL_LIBS]
                    [--target-opencl-model TARGET_OPENCL_MODEL]
                    [--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
                    [--target-opencl-tag TARGET_OPENCL_TAG]
                    [--target-opencl-device TARGET_OPENCL_DEVICE]
                    [--target-opencl-keys TARGET_OPENCL_KEYS]
                    [--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
                    [--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
                    [--target-metal-from_device TARGET_METAL_FROM_DEVICE]
                    [--target-metal-libs TARGET_METAL_LIBS]
                    [--target-metal-keys TARGET_METAL_KEYS]
                    [--target-metal-model TARGET_METAL_MODEL]
                    [--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
                    [--target-metal-tag TARGET_METAL_TAG]
                    [--target-metal-device TARGET_METAL_DEVICE]
                    [--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
                    [--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
                    [--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
                    [--target-webgpu-libs TARGET_WEBGPU_LIBS]
                    [--target-webgpu-model TARGET_WEBGPU_MODEL]
                    [--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
                    [--target-webgpu-tag TARGET_WEBGPU_TAG]
                    [--target-webgpu-device TARGET_WEBGPU_DEVICE]
                    [--target-webgpu-keys TARGET_WEBGPU_KEYS]
                    [--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
                    [--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
                    [--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
                    [--target-rocm-libs TARGET_ROCM_LIBS]
                    [--target-rocm-mattr TARGET_ROCM_MATTR]
                    [--target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK]
                    [--target-rocm-model TARGET_ROCM_MODEL]
                    [--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
                    [--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
                    [--target-rocm-tag TARGET_ROCM_TAG]
                    [--target-rocm-device TARGET_ROCM_DEVICE]
                    [--target-rocm-mcpu TARGET_ROCM_MCPU]
                    [--target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK]
                    [--target-rocm-keys TARGET_ROCM_KEYS]
                    [--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
                    [--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
                    [--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
                    [--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
                    [--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
                    [--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
                    [--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
                    [--target-vulkan-libs TARGET_VULKAN_LIBS]
                    [--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
                    [--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
                    [--target-vulkan-mattr TARGET_VULKAN_MATTR]
                    [--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
                    [--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
                    [--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
                    [--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
                    [--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
                    [--target-vulkan-model TARGET_VULKAN_MODEL]
                    [--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
                    [--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
                    [--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
                    [--target-vulkan-tag TARGET_VULKAN_TAG]
                    [--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
                    [--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
                    [--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
                    [--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
                    [--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
                    [--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
                    [--target-vulkan-device TARGET_VULKAN_DEVICE]
                    [--target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK]
                    [--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
                    [--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
                    [--target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT]
                    [--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
                    [--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
                    [--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
                    [--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
                    [--target-vulkan-keys TARGET_VULKAN_KEYS]
                    [--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
                    [--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
                    [--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
                    [--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
                    [--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
                    [--target-cuda-arch TARGET_CUDA_ARCH]
                    [--target-cuda-libs TARGET_CUDA_LIBS]
                    [--target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK]
                    [--target-cuda-model TARGET_CUDA_MODEL]
                    [--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
                    [--target-cuda-tag TARGET_CUDA_TAG]
                    [--target-cuda-device TARGET_CUDA_DEVICE]
                    [--target-cuda-mcpu TARGET_CUDA_MCPU]
                    [--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
                    [--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
                    [--target-cuda-keys TARGET_CUDA_KEYS]
                    [--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
                    [--target-sdaccel-libs TARGET_SDACCEL_LIBS]
                    [--target-sdaccel-model TARGET_SDACCEL_MODEL]
                    [--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
                    [--target-sdaccel-tag TARGET_SDACCEL_TAG]
                    [--target-sdaccel-device TARGET_SDACCEL_DEVICE]
                    [--target-sdaccel-keys TARGET_SDACCEL_KEYS]
                    [--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
                    [--target-composite-libs TARGET_COMPOSITE_LIBS]
                    [--target-composite-devices TARGET_COMPOSITE_DEVICES]
                    [--target-composite-model TARGET_COMPOSITE_MODEL]
                    [--target-composite-tag TARGET_COMPOSITE_TAG]
                    [--target-composite-device TARGET_COMPOSITE_DEVICE]
                    [--target-composite-keys TARGET_COMPOSITE_KEYS]
                    [--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
                    [--target-stackvm-libs TARGET_STACKVM_LIBS]
                    [--target-stackvm-model TARGET_STACKVM_MODEL]
                    [--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
                    [--target-stackvm-tag TARGET_STACKVM_TAG]
                    [--target-stackvm-device TARGET_STACKVM_DEVICE]
                    [--target-stackvm-keys TARGET_STACKVM_KEYS]
                    [--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
                    [--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
                    [--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
                    [--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
                    [--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
                    [--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
                    [--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
                    [--target-c-unpacked-api TARGET_C_UNPACKED_API]
                    [--target-c-from_device TARGET_C_FROM_DEVICE]
                    [--target-c-libs TARGET_C_LIBS]
                    [--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
                    [--target-c-executor TARGET_C_EXECUTOR]
                    [--target-c-link-params TARGET_C_LINK_PARAMS]
                    [--target-c-model TARGET_C_MODEL]
                    [--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
                    [--target-c-system-lib TARGET_C_SYSTEM_LIB]
                    [--target-c-tag TARGET_C_TAG]
                    [--target-c-interface-api TARGET_C_INTERFACE_API]
                    [--target-c-mcpu TARGET_C_MCPU]
                    [--target-c-device TARGET_C_DEVICE]
                    [--target-c-runtime TARGET_C_RUNTIME]
                    [--target-c-keys TARGET_C_KEYS]
                    [--target-c-march TARGET_C_MARCH]
                    [--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
                    [--target-hexagon-libs TARGET_HEXAGON_LIBS]
                    [--target-hexagon-mattr TARGET_HEXAGON_MATTR]
                    [--target-hexagon-model TARGET_HEXAGON_MODEL]
                    [--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
                    [--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
                    [--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
                    [--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
                    [--target-hexagon-device TARGET_HEXAGON_DEVICE]
                    [--target-hexagon-tag TARGET_HEXAGON_TAG]
                    [--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
                    [--target-hexagon-keys TARGET_HEXAGON_KEYS]
                    [--tuning-records PATH] [--executor EXECUTOR]
                    [--executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS]
                    [--executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT]
                    [--executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API]
                    [--executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API]
                    [--executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS]
                    [--runtime RUNTIME]
                    [--runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB]
                    [--runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB] [-v]
                    [-O [0-3]] [--input-shapes INPUT_SHAPES]
                    [--disabled-pass DISABLED_PASS]
                    [--module-name MODULE_NAME]
                    FILE

positional arguments:
  FILE                  path to the input model file.

optional arguments:
  -h, --help            show this help message and exit
  --cross-compiler CROSS_COMPILER
                        the cross compiler to generate target libraries, e.g.
                        'aarch64-linux-gnu-gcc'.
  --cross-compiler-options CROSS_COMPILER_OPTIONS
                        the cross compiler options to generate target
                        libraries, e.g. '-mfpu=neon-vfpv4'.
  --desired-layout {NCHW,NHWC}
                        change the data layout of the whole graph.
  --dump-code FORMAT    comma separated list of formats to export the input
                        model, e.g. 'asm,ll,relay'.
  --model-format {keras,onnx,pb,tflite,pytorch,paddle}
                        specify input model format.
  -o OUTPUT, --output OUTPUT
                        output the compiled module to a specified archive.
                        Defaults to 'module.tar'.
  -f {so,mlf}, --output-format {so,mlf}
                        output format. Use 'so' for shared object or 'mlf' for
                        Model Library Format (only for microTVM targets).
                        Defaults to 'so'.
  --pass-config name=value
                        configurations to be used at compile time. This option
                        can be provided multiple times, each one to set one
                        configuration value, e.g. '--pass-config
                        relay.backend.use_auto_scheduler=0', e.g. '--pass-
                        config
                        tir.add_lower_pass=opt_level1,pass1,opt_level2,pass2'.
  --target TARGET       compilation target as plain string, inline JSON or
                        path to a JSON file
  --tuning-records PATH
                        path to an auto-tuning log file by AutoTVM. If not
                        presented, the fallback/tophub configs will be used.
  --executor EXECUTOR   Executor to compile the model with
  --runtime RUNTIME     Runtime to compile the model with
  -v, --verbose         increase verbosity.
  -O [0-3], --opt-level [0-3]
                        specify which optimization level to use. Defaults to
                        '3'.
  --input-shapes INPUT_SHAPES
                        specify non-generic shapes for model to run, format is
                        "input_name:[dim1,dim2,...,dimn]
                        input_name2:[dim1,dim2]".
  --disabled-pass DISABLED_PASS
                        disable specific passes, comma-separated list of pass
                        names.
  --module-name MODULE_NAME
                        The output module name. Defaults to 'default'.

target example_target_hook:
  --target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
                        target example_target_hook from_device
  --target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
                        target example_target_hook libs options
  --target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
                        target example_target_hook model string
  --target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
                        target example_target_hook tag string
  --target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
                        target example_target_hook device string
  --target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
                        target example_target_hook keys options

target ext_dev:
  --target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
                        target ext_dev from_device
  --target-ext_dev-libs TARGET_EXT_DEV_LIBS
                        target ext_dev libs options
  --target-ext_dev-model TARGET_EXT_DEV_MODEL
                        target ext_dev model string
  --target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
                        target ext_dev system-lib
  --target-ext_dev-tag TARGET_EXT_DEV_TAG
                        target ext_dev tag string
  --target-ext_dev-device TARGET_EXT_DEV_DEVICE
                        target ext_dev device string
  --target-ext_dev-keys TARGET_EXT_DEV_KEYS
                        target ext_dev keys options

target llvm:
  --target-llvm-fast-math TARGET_LLVM_FAST_MATH
                        target llvm fast-math
  --target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
                        target llvm opt-level
  --target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
                        target llvm unpacked-api
  --target-llvm-from_device TARGET_LLVM_FROM_DEVICE
                        target llvm from_device
  --target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
                        target llvm fast-math-ninf
  --target-llvm-mattr TARGET_LLVM_MATTR
                        target llvm mattr options
  --target-llvm-num-cores TARGET_LLVM_NUM_CORES
                        target llvm num-cores
  --target-llvm-libs TARGET_LLVM_LIBS
                        target llvm libs options
  --target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
                        target llvm fast-math-nsz
  --target-llvm-link-params TARGET_LLVM_LINK_PARAMS
                        target llvm link-params
  --target-llvm-interface-api TARGET_LLVM_INTERFACE_API
                        target llvm interface-api string
  --target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
                        target llvm fast-math-contract
  --target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
                        target llvm system-lib
  --target-llvm-tag TARGET_LLVM_TAG
                        target llvm tag string
  --target-llvm-mtriple TARGET_LLVM_MTRIPLE
                        target llvm mtriple string
  --target-llvm-model TARGET_LLVM_MODEL
                        target llvm model string
  --target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
                        target llvm mfloat-abi string
  --target-llvm-mcpu TARGET_LLVM_MCPU
                        target llvm mcpu string
  --target-llvm-device TARGET_LLVM_DEVICE
                        target llvm device string
  --target-llvm-runtime TARGET_LLVM_RUNTIME
                        target llvm runtime string
  --target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
                        target llvm fast-math-arcp
  --target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
                        target llvm fast-math-reassoc
  --target-llvm-mabi TARGET_LLVM_MABI
                        target llvm mabi string
  --target-llvm-keys TARGET_LLVM_KEYS
                        target llvm keys options
  --target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
                        target llvm fast-math-nnan

target hybrid:
  --target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
                        target hybrid from_device
  --target-hybrid-libs TARGET_HYBRID_LIBS
                        target hybrid libs options
  --target-hybrid-model TARGET_HYBRID_MODEL
                        target hybrid model string
  --target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
                        target hybrid system-lib
  --target-hybrid-tag TARGET_HYBRID_TAG
                        target hybrid tag string
  --target-hybrid-device TARGET_HYBRID_DEVICE
                        target hybrid device string
  --target-hybrid-keys TARGET_HYBRID_KEYS
                        target hybrid keys options

target aocl:
  --target-aocl-from_device TARGET_AOCL_FROM_DEVICE
                        target aocl from_device
  --target-aocl-libs TARGET_AOCL_LIBS
                        target aocl libs options
  --target-aocl-model TARGET_AOCL_MODEL
                        target aocl model string
  --target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
                        target aocl system-lib
  --target-aocl-tag TARGET_AOCL_TAG
                        target aocl tag string
  --target-aocl-device TARGET_AOCL_DEVICE
                        target aocl device string
  --target-aocl-keys TARGET_AOCL_KEYS
                        target aocl keys options

target nvptx:
  --target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
                        target nvptx max_num_threads
  --target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
                        target nvptx thread_warp_size
  --target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
                        target nvptx from_device
  --target-nvptx-libs TARGET_NVPTX_LIBS
                        target nvptx libs options
  --target-nvptx-model TARGET_NVPTX_MODEL
                        target nvptx model string
  --target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
                        target nvptx system-lib
  --target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
                        target nvptx mtriple string
  --target-nvptx-tag TARGET_NVPTX_TAG
                        target nvptx tag string
  --target-nvptx-mcpu TARGET_NVPTX_MCPU
                        target nvptx mcpu string
  --target-nvptx-device TARGET_NVPTX_DEVICE
                        target nvptx device string
  --target-nvptx-keys TARGET_NVPTX_KEYS
                        target nvptx keys options

target opencl:
  --target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
                        target opencl max_num_threads
  --target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
                        target opencl thread_warp_size
  --target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
                        target opencl from_device
  --target-opencl-libs TARGET_OPENCL_LIBS
                        target opencl libs options
  --target-opencl-model TARGET_OPENCL_MODEL
                        target opencl model string
  --target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
                        target opencl system-lib
  --target-opencl-tag TARGET_OPENCL_TAG
                        target opencl tag string
  --target-opencl-device TARGET_OPENCL_DEVICE
                        target opencl device string
  --target-opencl-keys TARGET_OPENCL_KEYS
                        target opencl keys options

target metal:
  --target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
                        target metal max_num_threads
  --target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
                        target metal thread_warp_size
  --target-metal-from_device TARGET_METAL_FROM_DEVICE
                        target metal from_device
  --target-metal-libs TARGET_METAL_LIBS
                        target metal libs options
  --target-metal-keys TARGET_METAL_KEYS
                        target metal keys options
  --target-metal-model TARGET_METAL_MODEL
                        target metal model string
  --target-metal-system-lib TARGET_METAL_SYSTEM_LIB
                        target metal system-lib
  --target-metal-tag TARGET_METAL_TAG
                        target metal tag string
  --target-metal-device TARGET_METAL_DEVICE
                        target metal device string
  --target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
                        target metal max_function_args

target webgpu:
  --target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
                        target webgpu max_num_threads
  --target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
                        target webgpu from_device
  --target-webgpu-libs TARGET_WEBGPU_LIBS
                        target webgpu libs options
  --target-webgpu-model TARGET_WEBGPU_MODEL
                        target webgpu model string
  --target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
                        target webgpu system-lib
  --target-webgpu-tag TARGET_WEBGPU_TAG
                        target webgpu tag string
  --target-webgpu-device TARGET_WEBGPU_DEVICE
                        target webgpu device string
  --target-webgpu-keys TARGET_WEBGPU_KEYS
                        target webgpu keys options

target rocm:
  --target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
                        target rocm max_num_threads
  --target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
                        target rocm thread_warp_size
  --target-rocm-from_device TARGET_ROCM_FROM_DEVICE
                        target rocm from_device
  --target-rocm-libs TARGET_ROCM_LIBS
                        target rocm libs options
  --target-rocm-mattr TARGET_ROCM_MATTR
                        target rocm mattr options
  --target-rocm-max_shared_memory_per_block TARGET_ROCM_MAX_SHARED_MEMORY_PER_BLOCK
                        target rocm max_shared_memory_per_block
  --target-rocm-model TARGET_ROCM_MODEL
                        target rocm model string
  --target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
                        target rocm system-lib
  --target-rocm-mtriple TARGET_ROCM_MTRIPLE
                        target rocm mtriple string
  --target-rocm-tag TARGET_ROCM_TAG
                        target rocm tag string
  --target-rocm-device TARGET_ROCM_DEVICE
                        target rocm device string
  --target-rocm-mcpu TARGET_ROCM_MCPU
                        target rocm mcpu string
  --target-rocm-max_threads_per_block TARGET_ROCM_MAX_THREADS_PER_BLOCK
                        target rocm max_threads_per_block
  --target-rocm-keys TARGET_ROCM_KEYS
                        target rocm keys options

target vulkan:
  --target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
                        target vulkan max_num_threads
  --target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
                        target vulkan thread_warp_size
  --target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
                        target vulkan from_device
  --target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
                        target vulkan max_per_stage_descriptor_storage_buffer
  --target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
                        target vulkan driver_version
  --target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
                        target vulkan supports_16bit_buffer
  --target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
                        target vulkan max_block_size_z
  --target-vulkan-libs TARGET_VULKAN_LIBS
                        target vulkan libs options
  --target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
                        target vulkan supports_dedicated_allocation
  --target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
                        target vulkan supported_subgroup_operations
  --target-vulkan-mattr TARGET_VULKAN_MATTR
                        target vulkan mattr options
  --target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
                        target vulkan max_storage_buffer_range
  --target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
                        target vulkan max_push_constants_size
  --target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
                        target vulkan supports_push_descriptor
  --target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
                        target vulkan supports_int64
  --target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
                        target vulkan supports_float32
  --target-vulkan-model TARGET_VULKAN_MODEL
                        target vulkan model string
  --target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
                        target vulkan max_block_size_x
  --target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
                        target vulkan system-lib
  --target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
                        target vulkan max_block_size_y
  --target-vulkan-tag TARGET_VULKAN_TAG
                        target vulkan tag string
  --target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
                        target vulkan supports_int8
  --target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
                        target vulkan max_spirv_version
  --target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
                        target vulkan vulkan_api_version
  --target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
                        target vulkan supports_8bit_buffer
  --target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
                        target vulkan device_type string
  --target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
                        target vulkan supports_int32
  --target-vulkan-device TARGET_VULKAN_DEVICE
                        target vulkan device string
  --target-vulkan-max_threads_per_block TARGET_VULKAN_MAX_THREADS_PER_BLOCK
                        target vulkan max_threads_per_block
  --target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
                        target vulkan max_uniform_buffer_range
  --target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
                        target vulkan driver_name string
  --target-vulkan-supports_integer_dot_product TARGET_VULKAN_SUPPORTS_INTEGER_DOT_PRODUCT
                        target vulkan supports_integer_dot_product
  --target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
                        target vulkan supports_storage_buffer_storage_class
  --target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
                        target vulkan supports_float16
  --target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
                        target vulkan device_name string
  --target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
                        target vulkan supports_float64
  --target-vulkan-keys TARGET_VULKAN_KEYS
                        target vulkan keys options
  --target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
                        target vulkan max_shared_memory_per_block
  --target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
                        target vulkan supports_int16

target cuda:
  --target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
                        target cuda max_num_threads
  --target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
                        target cuda thread_warp_size
  --target-cuda-from_device TARGET_CUDA_FROM_DEVICE
                        target cuda from_device
  --target-cuda-arch TARGET_CUDA_ARCH
                        target cuda arch string
  --target-cuda-libs TARGET_CUDA_LIBS
                        target cuda libs options
  --target-cuda-max_shared_memory_per_block TARGET_CUDA_MAX_SHARED_MEMORY_PER_BLOCK
                        target cuda max_shared_memory_per_block
  --target-cuda-model TARGET_CUDA_MODEL
                        target cuda model string
  --target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
                        target cuda system-lib
  --target-cuda-tag TARGET_CUDA_TAG
                        target cuda tag string
  --target-cuda-device TARGET_CUDA_DEVICE
                        target cuda device string
  --target-cuda-mcpu TARGET_CUDA_MCPU
                        target cuda mcpu string
  --target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
                        target cuda max_threads_per_block
  --target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
                        target cuda registers_per_block
  --target-cuda-keys TARGET_CUDA_KEYS
                        target cuda keys options

target sdaccel:
  --target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
                        target sdaccel from_device
  --target-sdaccel-libs TARGET_SDACCEL_LIBS
                        target sdaccel libs options
  --target-sdaccel-model TARGET_SDACCEL_MODEL
                        target sdaccel model string
  --target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
                        target sdaccel system-lib
  --target-sdaccel-tag TARGET_SDACCEL_TAG
                        target sdaccel tag string
  --target-sdaccel-device TARGET_SDACCEL_DEVICE
                        target sdaccel device string
  --target-sdaccel-keys TARGET_SDACCEL_KEYS
                        target sdaccel keys options

target composite:
  --target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
                        target composite from_device
  --target-composite-libs TARGET_COMPOSITE_LIBS
                        target composite libs options
  --target-composite-devices TARGET_COMPOSITE_DEVICES
                        target composite devices options
  --target-composite-model TARGET_COMPOSITE_MODEL
                        target composite model string
  --target-composite-tag TARGET_COMPOSITE_TAG
                        target composite tag string
  --target-composite-device TARGET_COMPOSITE_DEVICE
                        target composite device string
  --target-composite-keys TARGET_COMPOSITE_KEYS
                        target composite keys options

target stackvm:
  --target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
                        target stackvm from_device
  --target-stackvm-libs TARGET_STACKVM_LIBS
                        target stackvm libs options
  --target-stackvm-model TARGET_STACKVM_MODEL
                        target stackvm model string
  --target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
                        target stackvm system-lib
  --target-stackvm-tag TARGET_STACKVM_TAG
                        target stackvm tag string
  --target-stackvm-device TARGET_STACKVM_DEVICE
                        target stackvm device string
  --target-stackvm-keys TARGET_STACKVM_KEYS
                        target stackvm keys options

target aocl_sw_emu:
  --target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
                        target aocl_sw_emu from_device
  --target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
                        target aocl_sw_emu libs options
  --target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
                        target aocl_sw_emu model string
  --target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
                        target aocl_sw_emu system-lib
  --target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
                        target aocl_sw_emu tag string
  --target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
                        target aocl_sw_emu device string
  --target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
                        target aocl_sw_emu keys options

target c:
  --target-c-unpacked-api TARGET_C_UNPACKED_API
                        target c unpacked-api
  --target-c-from_device TARGET_C_FROM_DEVICE
                        target c from_device
  --target-c-libs TARGET_C_LIBS
                        target c libs options
  --target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
                        target c constants-byte-alignment
  --target-c-executor TARGET_C_EXECUTOR
                        target c executor string
  --target-c-link-params TARGET_C_LINK_PARAMS
                        target c link-params
  --target-c-model TARGET_C_MODEL
                        target c model string
  --target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
                        target c workspace-byte-alignment
  --target-c-system-lib TARGET_C_SYSTEM_LIB
                        target c system-lib
  --target-c-tag TARGET_C_TAG
                        target c tag string
  --target-c-interface-api TARGET_C_INTERFACE_API
                        target c interface-api string
  --target-c-mcpu TARGET_C_MCPU
                        target c mcpu string
  --target-c-device TARGET_C_DEVICE
                        target c device string
  --target-c-runtime TARGET_C_RUNTIME
                        target c runtime string
  --target-c-keys TARGET_C_KEYS
                        target c keys options
  --target-c-march TARGET_C_MARCH
                        target c march string

target hexagon:
  --target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
                        target hexagon from_device
  --target-hexagon-libs TARGET_HEXAGON_LIBS
                        target hexagon libs options
  --target-hexagon-mattr TARGET_HEXAGON_MATTR
                        target hexagon mattr options
  --target-hexagon-model TARGET_HEXAGON_MODEL
                        target hexagon model string
  --target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
                        target hexagon llvm-options options
  --target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
                        target hexagon mtriple string
  --target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
                        target hexagon system-lib
  --target-hexagon-mcpu TARGET_HEXAGON_MCPU
                        target hexagon mcpu string
  --target-hexagon-device TARGET_HEXAGON_DEVICE
                        target hexagon device string
  --target-hexagon-tag TARGET_HEXAGON_TAG
                        target hexagon tag string
  --target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
                        target hexagon link-params
  --target-hexagon-keys TARGET_HEXAGON_KEYS
                        target hexagon keys options

executor graph:
  --executor-graph-link-params EXECUTOR_GRAPH_LINK_PARAMS
                        Executor graph link-params

executor aot:
  --executor-aot-workspace-byte-alignment EXECUTOR_AOT_WORKSPACE_BYTE_ALIGNMENT
                        Executor aot workspace-byte-alignment
  --executor-aot-unpacked-api EXECUTOR_AOT_UNPACKED_API
                        Executor aot unpacked-api
  --executor-aot-interface-api EXECUTOR_AOT_INTERFACE_API
                        Executor aot interface-api string
  --executor-aot-link-params EXECUTOR_AOT_LINK_PARAMS
                        Executor aot link-params

runtime cpp:
  --runtime-cpp-system-lib RUNTIME_CPP_SYSTEM_LIB
                        Runtime cpp system-lib

runtime crt:
  --runtime-crt-system-lib RUNTIME_CRT_SYSTEM_LIB
                        Runtime crt system-lib

现在,模型的调谐数据已经收集完毕,可以使用优化的算子重新编译模型,以加快计算速度。

!python -m tvm.driver.tvmc compile \
    --target "llvm" \
        --tuning-records resnet50-v2-7-autotuner_records.json  \
            --output resnet50-v2-7-tvm_autotuned.tar \
                ../../_models/resnet50-v2-7.onnx

验证优化后的模型是否运行并产生相同的结果:

!python -m tvm.driver.tvmc run \
    --inputs imagenet_cat.npz \
        --output predictions.npz \
            resnet50-v2-7-tvm_autotuned.tar

!python postprocess.py
class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

比较已调谐和未调谐的模型#

TVMC 提供了在模型之间进行基本性能基准测试的工具。你可以指定重复次数,并且 TVMC 报告模型的运行时间(与运行时间的启动无关)。可以粗略了解调谐对模型性能的改善程度。例如,在测试的英特尔 i7 系统上,看到调谐后的模型比未调谐的模型运行快 \(47\%\)

!python -m tvm.driver.tvmc run \
    --inputs imagenet_cat.npz \
        --output predictions.npz  \
            --print-time \
                --repeat 100 \
                    resnet50-v2-7-tvm_autotuned.tar
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  41.2506      40.8879      54.4469      36.7249       2.4430   
               
!python -m tvm.driver.tvmc run \
    --inputs imagenet_cat.npz \
        --output predictions.npz  \
            --print-time \
                --repeat 100 \
                    resnet50-v2-7-tvm.tar
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  51.8327      52.5906      67.5374      42.9440       4.4040   
               

小结#

在本教程中,介绍了 TVMC,用于 TVM 的命令行驱动。演示了如何编译、运行和调优模型。还讨论了对输入和输出进行预处理和后处理的必要性。在调优过程之后,演示了如何比较未优化和优化后的模型的性能。

这里介绍了使用 ResNet-50 v2 本地的简单例子。然而,TVMC 支持更多的功能,包括交叉编译、远程执行和剖析/基准测试(profiling/benchmarking)。

要想知道还有哪些可用的选项,请看 tvmc --help

用 Python 接口编译和优化模型 教程中,将使用 Python 接口介绍同样的编译和优化步骤。