{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 开始使用 TVMC Python:TVM 的高级 API\n", "\n", "**原作者**: [Jocelyn Shiue](https://github.com/CircleSpin)\n", "\n", "## Step 0: 导入\n", "\n", "导入本地 TVM 环境:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "from tvm.driver import tvmc\n", "\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: 加载模型\n", "\n", "将模型导入到 tvmc 中。这一步将机器学习模型从受支持的框架转换为 TVM 的高级图表示语言 Relay。这将为 TVM 中的所有模型提供一个统一的起点。目前支持的框架有:Keras、ONNX、Tensorflow、TFLite 和 PyTorch。\n", "\n", "```sh\n", "!wget https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import onnx\n", "\n", "model_path = 'resnet50-v2-7.onnx'\n", "onnx_model = onnx.load(model_path)\n", "\n", "model = tvmc.load(model_path, shape_dict={\"data\": [1, 3, 224, 224]}) # Step 1: Load" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "所有框架都支持使用 `shape_dict` 参数覆盖输入 shape。对于大多数框架来说,这是可选的,但对于 Pytorch 来说,这是必要的,因为 TVM 不能自动搜索它。\n", "\n", "```python\n", "#Step 1: Load + shape_dict\n", "model = tvmc.load(my_model,\n", " shape_dict={'input1': [1, 2, 3, 4],\n", " 'input2': [1, 2, 3, 4]})\n", "```\n", "\n", "```{tip}\n", "查看模型的 input/shape_dict 的推荐方法是通过 [netron](https://netron.app/)。打开模型后,单击第一个节点,在 inputs 部分查看名称和形状。\n", "```\n", "\n", "如果你想看 Relay,你可以运行:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.summary() # 输出内容太多,此处已省略" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: 编译\n", "\n", "既然模型已经在 Relay 中,下一步就是将它编译到需要运行的硬件上。这个硬件称为目标(target)。此编译过程将模型从 Relay 转换为目标机器可以理解的较低级语言。\n", "\n", "为了编译模型 ``tvm.target`` 字符串是必需的。查看[文档](https://tvm.apache.org/docs/api/python/target.html),了解更多关于 `tvm.target` 的信息及其选项。一些例子包括:\n", "\n", "1. cuda (Nvidia GPU)\n", "2. llvm (CPU)\n", "3. llvm -mcpu=cascadelake (Intel CPU)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.\n" ] } ], "source": [ "# Step 2: Compile\n", "package = tvmc.compile(model, target=\"llvm\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "编译步骤返回 `package`。\n", "\n", "## Step 3: 运行\n", "\n", "编译后的包现在可以在硬件目标上运行。设备输入选项有:CPU、Cuda、CL、Metal 和 Vulkan。\n", "\n", "使用 CUDA,需要:\n", "\n", "```bash\n", "conda install -c conda-forge py-xgboost-gpu\n", "pip install cloudpickle\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-08-30 09:43:06.688 INFO load_module /tmp/tmpmv4iao74/mod.so\n" ] } ], "source": [ "#Step 3: Run\n", "results = tvmc.run(package, device=\"cpu\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可以打印结果:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Execution time summary:\n", " mean (ms) median (ms) max (ms) min (ms) std (ms) \n", " 97.9411 93.8668 127.5822 84.3479 12.3553 \n", " \n", "Output Names:\n", " ['output_0']\n" ] } ], "source": [ "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tune [可选 && 推荐]\n", "\n", "通过调优可以进一步提高运行速度。这个可选步骤使用机器学习来查看模型(函数)中的每个运算,并试图找到更快的方法来运行它。通过成本模型来做到这一点,并对可能的调度进行基准测试。\n", "\n", "此处 `target` 与编译相同。\n", "\n", "```python\n", "# Step 1.5: Optional Tune\n", "# 可以是 \"cuda\"\n", "tvmc.tune(model, target=\"llvm\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这将使最终结果更快,但可能需要数小时来调优。\n", "\n", "请参阅下面的 [保存调优结果](保存调优结果)。如果希望应用调优结果,请确保将调优结果传递到 `compile` 中。\n", "\n", "```python\n", "# Step 2: Compile\n", "tvmc.compile(model,\n", " target=\"cuda\",\n", " tuning_records=\"records.log\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 额外的 TVMC 功能\n", "\n", "### 保存模型\n", "\n", "为了以后更快,加载模型(Step 1)后保存 Relay 版本。然后,模型将出现在您为稍后转换语法保存它的地方。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "model = tvmc.load(model_path, shape_dict={\"data\": [1, 3, 224, 224]}) #Step 1: Load\n", "desired_model_path = 'new_model.onnx'\n", "model.save(desired_model_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 保存包\n", "\n", "在模型被编译(Step 2)之后,包也可以被保存。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-08-30 09:45:18.629 INFO load_module /tmp/tmp6isdkeif/mod.so\n" ] } ], "source": [ "tvmc.compile(model, target=\"llvm\", package_path=\"whatever\")\n", "\n", "new_package = tvmc.TVMCPackage(package_path=\"whatever\")\n", "#Step 3: Run\n", "result = tvmc.run(new_package, device='cpu') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 使用 Autoscheduler\n", "\n", "使用下一代 tvm 来启用可能更快的运行速度结果。调度的搜索空间是自动生成的,不像之前需要手写。\n", "\n", "```{seealso}\n", "1. 博文:[引入 Auto-scheduler TVM](https://tvm.apache.org/2021/03/03/intro-auto-scheduler)\n", "2. 论文:[Ansor : Generating High-Performance Tensor Programs for Deep Learning](https://arxiv.org/abs/2006.06762)\n", "```\n", "\n", "```python\n", "tvmc.tune(model,\n", " target=\"llvm\",\n", " enable_autoscheduler=True)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 保存调优结果\n", "\n", "调优结果可以保存在文件中,以便以后重用。\n", "\n", "````{tabbed} 方式1\n", "```python\n", "log_file = \"hello.json\"\n", "\n", "# Run tuning\n", "tvmc.tune(model, target=\"llvm\", tuning_records=log_file)\n", "\n", "...\n", "\n", "# Later run tuning and reuse tuning results\n", "tvmc.tune(model, target=\"llvm\", tuning_records=log_file)\n", "```\n", "````\n", "````{tabbed} 方式2\n", "```python\n", "# Run tuning\n", "tuning_records = tvmc.tune(model, target=\"llvm\")\n", "\n", "...\n", "\n", "# Later run tuning and reuse tuning results\n", "tvmc.tune(model, target=\"llvm\", tuning_records=tuning_records)\n", "```\n", "````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 调优更多复杂模型\n", "\n", "你可能注意到 T 的打印像 ``.........T.T..T..T..T.T.T.T.T.T.`` 增加了搜索时间范围:\n", "\n", "```python\n", "tvmc.tune(model,\n", " target='cpu',\n", " trials=10000,\n", " timeout=10)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 为远程设备编译模型\n", "\n", "当您希望为不在本地机器上的硬件进行编译时,远程过程调用(remote procedural call,简称 RPC)非常有用。`tvmc` 方法支持这一点。要设置 RPC 服务器,请查看[交叉编译和 RPC 文档](cross_compilation_and_rpc)中的“在设备上设置 RPC 服务器”一节。\n", "\n", "在 TVMC 脚本中包括以下内容并进行相应调整:\n", "\n", "```python\n", "tvmc.tune(\n", " model,\n", " target=target, # Compilation target as string // Device to compile for\n", " target_host=target_host, # Host processor\n", " hostname=host_ip_address, # The IP address of an RPC tracker, used when benchmarking remotely.\n", " port=port_number, # The port of the RPC tracker to connect to. Defaults to 9090.\n", " rpc_key=your_key, # The RPC tracker key of the target device. Required when rpc_tracker is provided\n", ")\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('py38': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "vscode": { "interpreter": { "hash": "28558e8daad512806f5c536a1a04c119185f99f65b79002708a12162d02a79c7" } } }, "nbformat": 4, "nbformat_minor": 0 }