通过 UMA 使您的硬件加速器 TVM-ready#

Authors: Michael J. Klaiber, Christoph Gerum, Paul Palomero Bernardo

这是 通用模块化加速器接口 （Universal Modular Accelerator Interface，简称 UMA）的入门教程。UMA 提供简单易用的 API，将新的硬件加速器集成到 TVM 中。

本教程逐步指导您如何使用 UMA，使您的硬件加速器 TVM-ready。

虽然这个问题没有一刀切的解决方案，但 UMA 的目标是提供一个稳定的、仅使用 Python 的 API，将许多硬件加速器类集成到 TVM 中。

在本教程中，您将了解 UMA API 在三个日益复杂的用例中。在这些用例中，三个模拟加速器 Vanilla、Strawberry 和 Chocolate 被引入并使用 UMA 集成到 TVM 中。

Vanilla#

Vanilla 是由 MAC array 组成，没有内部内存的简单加速器。它只能处理 Conv2D 层，所有其他层都在 CPU 上执行，这也协调了 Vanilla。CPU 和 Vanilla 都使用共享内存。

Vanilla 提供 C 接口 vanilla_conv2dnchw(...) 用于携带 Conv2D 运算（包括 same-padding），它接受指向输入 feature map、权重和结果的指针，以及的 Conv2D 的维度：oc, iw, ih, ic, kh, kw。

int vanilla_conv2dnchw(float* ifmap, float*  weights, float*  result, int oc, int iw, int ih, int ic, int kh, int kw);

脚本 uma_cli 为新的加速器创建带有 API-calls 的代码骨架。

对于 Vanilla，我们使用如下方法：（--tutorial vanilla 添加教程这部分所需的所有附加文件）

pip install inflection
cd $TVM_HOME/apps/uma
python uma_cli.py --add_hardware vanilla_accelerator --tutorial vanilla

uma_cli.py 将在 vanilla_accelerator 目录中生成这些文件，我们将重新访问这个目录。

backend.py
codegen.py
conv2dnchw.cc
passes.py
patterns.py
run.py
strategies.py

Vanilla 后端#

为 vanilla 生成的后端可以在 vanilla_accelerator/backend.py 中找到：

class VanillaAcceleratorBackend(UMABackend):
    """UMA backend for VanillaAccelerator."""

    def __init__(self):
        super().__init__()

        self._register_pattern("conv2d", conv2d_pattern())
        self._register_tir_pass(PassPhase.TIR_PHASE_0, VanillaAcceleratorConv2DPass())
        self._register_codegen(fmt="c", includes=gen_includes)

    @property
    def target_name(self):
        return "vanilla_accelerator"

定义卸载模式#

为了指定将 Conv2D 卸载（offloaded）到 Vanilla，它在 vanilla_accelerator/patterns.py 中被描述为 Relay 数据流模式(DFPattern)

def conv2d_pattern():
    pattern = is_op("nn.conv2d")(wildcard(), wildcard())
    pattern = pattern.has_attr({"strides": [1, 1]})
    return pattern

为了将 Conv2D 运算从 input graph 映射到 Vanilla 的低级函数调用 vanilla_conv2dnchw(...)，TIR pass VanillaAcceleratorConv2DPass (将在本教程后面讨论)在 VanillaAcceleratorBackend 中注册。

Codegen#

文件 vanilla_accelerator/codegen.py 定义了静态 C 代码，它被添加到由 TVM C-Codegen 在 gen_includes 中生成的 C 代码。

这里添加了 C 代码，以包含 Vanilla 的低级库 vanilla_conv2dnchw()。

def gen_includes() -> str:
    topdir = pathlib.Path(__file__).parent.absolute()

    includes = ""
    includes += f'#include "{topdir}/conv2dnchw.cc"'
    return includes

如上所示，在 VanillaAcceleratorBackend 中，它通过 self._register_codegen 注册到 UMA

self._register_codegen(fmt="c", includes=gen_includes)

建立神经网络，并在 Vanilla 上运行#

为了演示 UMA 的功能，我们将为单个 Conv2D 层生成 C 代码，并在 Vanilla 加速器上运行它。文件 vanilla_accelerator/run.py 提供了一个使用 Vanilla 的 C-API 运行 Conv2D 层的演示。

def main():
    mod, inputs, output_list, runner = create_conv2d()

    uma_backend = VanillaAcceleratorBackend()
    uma_backend.register()
    mod = uma_backend.partition(mod)
    target = tvm.target.Target("vanilla_accelerator", host=tvm.target.Target("c"))

    export_directory = tvm.contrib.utils.tempdir(keep_for_debug=True).path
    print(f"Generated files are in {export_directory}")
    compile_and_run(
        AOTModel(module=mod, inputs=inputs, outputs=output_list),
        runner,
        interface_api="c",
        use_unpacked_api=True,
        target=target,
        test_dir=str(export_directory),
    )


main()

通过运行 vanilla_accelerator/run.py，输出文件将以模型库格式（model library format，简称 MLF）生成。

Output#

Generated files are in /tmp/tvm-debug-mode-tempdirs/2022-07-13T13-26-22___x5u76h0p/00000

让我们检查一下生成的文件：

Output:

cd /tmp/tvm-debug-mode-tempdirs/2022-07-13T13-26-22___x5u76h0p/00000
cd build/
ls -1

codegen
lib.tar
metadata.json
parameters
runtime
src

要 evaluate 生成的 C 代码，请访问 codegen/host/src/default_lib2.c：

cd codegen/host/src/
ls -1

default_lib0.c
default_lib1.c
default_lib2.c

在 default_lib2.c 中，你现在可以看到生成的代码调用了 Vanilla 的 C-API 并执行了 Conv2D 层：

TVM_DLL int32_t tvmgen_default_vanilla_accelerator_main_0(float* placeholder, float* placeholder1, float* conv2d_nchw, uint8_t* global_workspace_1_var) {
     vanilla_accelerator_conv2dnchw(placeholder, placeholder1, conv2d_nchw, 32, 14, 14, 32, 3, 3);
     return 0;
}

Strawberry#

Coming soon …

Chocolate#

Coming soon …

Request for Community Input#

If this tutorial did not fit to your accelerator, lease add your requirements to the UMA thread in the TVM discuss forum: Link. We are eager to extend this tutorial to provide guidance on making further classes of AI hardware accelerators TVM-ready using the UMA interface.

References#

[UMA-RFC] UMA: Universal Modular Accelerator Interface, TVM RFC, June 2022.

[DFPattern] Pattern Matching in Relay

TVM 文档

通过 UMA 使您的硬件加速器 TVM-ready

导航