在 TVM 上部署 yolort

本文是使用 Relay VM 部署 PyTorch YOLOv5 模型的介绍性教程。

首先应该安装 PyTorch。TorchVision 也是必需的,因为将使用它作为模型动物园。

快速解决方案:

pip install torch==1.10.1
pip install torchvision==0.11.2

或者参考官方教程: https://pytorch.org/get-started/locally/

PyTorch 版本应该向后兼容,但应该与正确的 TorchVision 版本一起使用。

目前,仅用 PyTorch 1.7.x 和 1.10.x 测试 TVM。其他版本可能不稳定。

这个笔记本是用 macOS M1 运行的。


版权所有 © 大多数代码复制自 TVM 教程

[1]:
# 加载 TVM
from tvm_book.tvm.env import set_mxnet, set_tvm, set_cudnn
# TVM_ROOT = "/media/pc/data/4tb/zzy/zzy/npu/tvm"
TVM_ROOT = "/media/pc/data/4tb/lxw/tvm"
TVM_ROOT = "/media/pc/data/4tb/lxw/books/tvm"
set_tvm(TVM_ROOT)
set_mxnet()
set_cudnn()
[2]:
import tvm
from tvm import relay
from tvm.runtime.vm import VirtualMachine

import numpy as np
import cv2

# PyTorch imports
import torch
from torch import nn
import torchvision

加载预训练的 yolov5n 模型并追踪

[3]:
in_size = 640
input_shape = (in_size, in_size)
[4]:
from yolort.models import yolov5n
from yolort.relay import get_trace_module
[ ]:
model_func = yolov5n(pretrained=True, size=(in_size, in_size))
script_module = get_trace_module(model_func, input_shape=input_shape)
[6]:
script_module.graph
[6]:
graph(%self.1 : __torch__.yolort.relay.trace_wrapper.TraceWrapper,
      %images : Float(1, 3, 640, 640, strides=[1228800, 409600, 640, 1], requires_grad=0, device=cpu)):
  %4653 : __torch__.yolort.models.yolov5.YOLOv5 = prim::GetAttr[name="model"](%self.1)
  %5021 : (Tensor, Tensor, Tensor) = prim::CallMethod[name="forward"](%4653, %images)
  %5018 : Float(0, 4, strides=[4, 1], requires_grad=0, device=cpu), %5019 : Float(0, strides=[1], requires_grad=0, device=cpu), %5020 : Long(0, strides=[1], requires_grad=0, device=cpu) = prim::TupleUnpack(%5021)
  %3799 : (Float(0, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(0, strides=[1], requires_grad=0, device=cpu), Long(0, strides=[1], requires_grad=0, device=cpu)) = prim::TupleConstruct(%5018, %5019, %5020)
  return (%3799)

下载测试图片并预处理

[7]:
from yolort.utils import get_image_from_url

img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/bus.jpg"
# img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/zidane.jpg"
img = get_image_from_url(img_source)

img = img.astype("float32")
img = cv2.resize(img, (in_size, in_size))

img = np.transpose(img / 255.0, [2, 0, 1])
img = np.expand_dims(img, axis=0)

导入 graph 到 Relay

[ ]:
input_name = "input0"
shape_list = [(input_name, (1, 3, *input_shape))]
mod, params = relay.frontend.from_pytorch(script_module, shape_list)

使用 Relay VM 编译

注:目前只支持 CPU target。对于 x86 target,由于在 torchvision rcnn 模型中存在较大的 dense 算子,因此强烈推荐使用 Intel MKL 和 Intel OpenMP 构建 TVM 以获得最佳性能。

[ ]:
# Add "-libs=mkl" to get best performance on x86 target.
# For x86 machine supports AVX512, the complete target is
# "llvm -mcpu=skylake-avx512 -libs=mkl"
target = "llvm"

with tvm.transform.PassContext(opt_level=3):
    vm_exec = relay.vm.compile(mod, target=target, params=params)

使用 Relay VM 推理

[10]:
ctx = tvm.cpu()
vm = VirtualMachine(vm_exec, ctx)
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()
[11]:
%%timeit
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()
52.7 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

获取得分大于 0.6 的 box

[12]:
score_threshold = 0.6
boxes = tvm_res[0].asnumpy().tolist()
valid_boxes = []
for i, score in enumerate(tvm_res[1].asnumpy().tolist()):
    if score > score_threshold:
        valid_boxes.append(boxes[i])
    else:
        break

print(f"Get {len(valid_boxes)} valid boxes")
Get 4 valid boxes

验证 TVM 后端的推理输出

[13]:
with torch.no_grad():
    torch_res = script_module(torch.from_numpy(img))
[14]:
for i in range(len(torch_res)):
    torch.testing.assert_allclose(torch_res[i], tvm_res[i].asnumpy(), rtol=1e-4, atol=1e-4)

print("Exported model has been tested with TVM Runtime, and the result looks good!")
Exported model has been tested with TVM Runtime, and the result looks good!
[ ]:

View this document as a notebook: https://github.com/zhiqwang/yolov5-rt-stack/blob/main/notebooks/export-relay-inference-tvm.ipynb