在 TVM 上部署 yolort¶

本文是使用 Relay VM 部署 PyTorch YOLOv5 模型的介绍性教程。

首先应该安装 PyTorch。TorchVision 也是必需的，因为将使用它作为模型动物园。

快速解决方案：

pip install torch==1.10.1
pip install torchvision==0.11.2

或者参考官方教程： https://pytorch.org/get-started/locally/

PyTorch 版本应该向后兼容，但应该与正确的 TorchVision 版本一起使用。

目前，仅用 PyTorch 1.7.x 和 1.10.x 测试 TVM。其他版本可能不稳定。

这个笔记本是用 macOS M1 运行的。

[1]:

# 加载 TVM
from tvm_book.tvm.env import set_mxnet, set_tvm, set_cudnn
# TVM_ROOT = "/media/pc/data/4tb/zzy/zzy/npu/tvm"
TVM_ROOT = "/media/pc/data/4tb/lxw/tvm"
TVM_ROOT = "/media/pc/data/4tb/lxw/books/tvm"
set_tvm(TVM_ROOT)
set_mxnet()
set_cudnn()

[2]:

import tvm
from tvm import relay
from tvm.runtime.vm import VirtualMachine

import numpy as np
import cv2

# PyTorch imports
import torch
from torch import nn
import torchvision

加载预训练的 `yolov5n` 模型并追踪¶

[3]:

in_size = 640
input_shape = (in_size, in_size)

[4]:

from yolort.models import yolov5n
from yolort.relay import get_trace_module

[ ]:

model_func = yolov5n(pretrained=True, size=(in_size, in_size))
script_module = get_trace_module(model_func, input_shape=input_shape)

[6]:

script_module.graph

[6]:

graph(%self.1 : __torch__.yolort.relay.trace_wrapper.TraceWrapper,
      %images : Float(1, 3, 640, 640, strides=[1228800, 409600, 640, 1], requires_grad=0, device=cpu)):
  %4653 : __torch__.yolort.models.yolov5.YOLOv5 = prim::GetAttr[name="model"](%self.1)
  %5021 : (Tensor, Tensor, Tensor) = prim::CallMethod[name="forward"](%4653, %images)
  %5018 : Float(0, 4, strides=[4, 1], requires_grad=0, device=cpu), %5019 : Float(0, strides=[1], requires_grad=0, device=cpu), %5020 : Long(0, strides=[1], requires_grad=0, device=cpu) = prim::TupleUnpack(%5021)
  %3799 : (Float(0, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(0, strides=[1], requires_grad=0, device=cpu), Long(0, strides=[1], requires_grad=0, device=cpu)) = prim::TupleConstruct(%5018, %5019, %5020)
  return (%3799)

下载测试图片并预处理¶

[7]:

from yolort.utils import get_image_from_url

img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/bus.jpg"
# img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/zidane.jpg"
img = get_image_from_url(img_source)

img = img.astype("float32")
img = cv2.resize(img, (in_size, in_size))

img = np.transpose(img / 255.0, [2, 0, 1])
img = np.expand_dims(img, axis=0)

导入 graph 到 Relay¶

[ ]:

input_name = "input0"
shape_list = [(input_name, (1, 3, *input_shape))]
mod, params = relay.frontend.from_pytorch(script_module, shape_list)

使用 Relay VM 编译¶

注:目前只支持 CPU target。对于 x86 target，由于在 torchvision rcnn 模型中存在较大的 dense 算子，因此强烈推荐使用 Intel MKL 和 Intel OpenMP 构建 TVM 以获得最佳性能。

[ ]:

# Add "-libs=mkl" to get best performance on x86 target.
# For x86 machine supports AVX512, the complete target is
# "llvm -mcpu=skylake-avx512 -libs=mkl"
target = "llvm"

with tvm.transform.PassContext(opt_level=3):
    vm_exec = relay.vm.compile(mod, target=target, params=params)

使用 Relay VM 推理¶

[10]:

ctx = tvm.cpu()
vm = VirtualMachine(vm_exec, ctx)
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

[11]:

%%timeit
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

52.7 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

获取得分大于 0.6 的 box¶

[12]:

score_threshold = 0.6
boxes = tvm_res[0].asnumpy().tolist()
valid_boxes = []
for i, score in enumerate(tvm_res[1].asnumpy().tolist()):
    if score > score_threshold:
        valid_boxes.append(boxes[i])
    else:
        break

print(f"Get {len(valid_boxes)} valid boxes")

Get 4 valid boxes

验证 TVM 后端的推理输出¶

[13]:

with torch.no_grad():
    torch_res = script_module(torch.from_numpy(img))

[14]:

for i in range(len(torch_res)):
    torch.testing.assert_allclose(torch_res[i], tvm_res[i].asnumpy(), rtol=1e-4, atol=1e-4)

print("Exported model has been tested with TVM Runtime, and the result looks good!")

Exported model has been tested with TVM Runtime, and the result looks good!

[ ]:

View this document as a notebook: https://github.com/zhiqwang/yolov5-rt-stack/blob/main/notebooks/export-relay-inference-tvm.ipynb