{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 使用 TEDD 进行可视化\n", "\n", "**原作者**: [Yongfeng Gu](https://github.com/yongfeng-nv)\n", "\n", "这是关于使用 TEDD (Tensor Expression Debug Display) 可视化张量表达式的介绍。\n", "\n", "张量表达式使用原语(primitive)调度。尽管单个原语通常很容易理解,但当您将它们放在一起时,它们很快就变得复杂了。在张量表达式中引入了调度原语的运算模型 (operational model)。\n", "\n", "- 不同调度原语之间的交互,\n", "- 调度原语对最终代码生成(code generation)的影响。\n", "\n", "运算模型基于数据流图(Dataflow Graph)、调度树(Schedule Tree)和迭代关系图(IterVar Relationship Graph)。调度原语对这些图执行运算。\n", "\n", "TEDD 根据给定的调度呈现这三个图。本教程演示了如何使用 TEDD 以及如何解释呈现的图。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import tvm\n", "from tvm import te\n", "from tvm import topi\n", "from tvm.contrib import tedd\n", "from IPython.display import display_svg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义和调度带有 bias 和 ReLU 的卷积\n", "\n", "建立包含 Bias 和 ReLU 的卷积张量表达式的例子。首先连接 conv2d、add 和 relu TOPIs。然后,创建 TOPI 通用调度。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "batch = 1\n", "in_channel = 256\n", "in_size = 32\n", "num_filter = 256\n", "kernel = 3\n", "stride = 1\n", "padding = \"SAME\"\n", "dilation = 1\n", "\n", "A = te.placeholder((in_size, in_size, in_channel, batch), name=\"A\")\n", "W = te.placeholder((kernel, kernel, in_channel, num_filter), name=\"W\")\n", "B = te.placeholder((1, num_filter, 1), name=\"bias\")\n", "\n", "with tvm.target.Target(\"llvm\"):\n", " t_conv = topi.nn.conv2d_hwcn(A, W, stride, padding, dilation)\n", " t_bias = topi.add(t_conv, B)\n", " t_relu = topi.nn.relu(t_bias)\n", " s = topi.generic.schedule_conv2d_hwcn([t_relu])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用 TEDD 渲染 Graph\n", "\n", "渲染图来查看计算过程以及如何调度它。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": "\n\nDataflow Graph\n\n\n\nStage_0\n\n\n\n\nA\nScope: \n\n\n0\n\n\n\nTensor_0_0\n\n[32, 32, 256, 1]\nfloat32\n\n\n\nStage_0:O_0->Tensor_0_0\n\n\n\n\n\nStage_1\n\n\n0\n\n\npad_temp\nScope: \n\n\n0\n\n\n\nTensor_0_0->Stage_1:I_0\n\n\n\n\n\nTensor_1_0\n\n[34, 34, 256, 1]\nfloat32\n\n\n\nStage_1:O_0->Tensor_1_0\n\n\n\n\n\nStage_3\n\n\n0\n\n\nconv2d_hwcn\nScope: \n\n\n0\n\n\n1\n\n\n\n\n\nTensor_1_0->Stage_3:I_0\n\n\n\n\n\nStage_2\n\n\n\n\nW\nScope: \n\n\n0\n\n\n\nTensor_2_0\n\n[3, 3, 256, 256]\nfloat32\n\n\n\nStage_2:O_0->Tensor_2_0\n\n\n\n\n\nTensor_2_0->Stage_3:I_1\n\n\n\n\n\nTensor_3_0\n\n[32, 32, 256, 1]\nfloat32\n\n\n\nStage_3:O_0->Tensor_3_0\n\n\n\n\n\nStage_5\n\n\n0\n\n\nT_add\nScope: \n\n\n0\n\n\n1\n\n\n\n\n\nTensor_3_0->Stage_5:I_0\n\n\n\n\n\nStage_4\n\n\n\n\nbias\nScope: \n\n\n0\n\n\n\nTensor_4_0\n\n[1, 256, 1]\nfloat32\n\n\n\nStage_4:O_0->Tensor_4_0\n\n\n\n\n\nTensor_4_0->Stage_5:I_1\n\n\n\n\n\nTensor_5_0\n\n[32, 32, 256, 1]\nfloat32\n\n\n\nStage_5:O_0->Tensor_5_0\n\n\n\n\n\nStage_6\n\n\n0\n\n\ncompute\nScope: \n\n\n0\n\n\n\nTensor_5_0->Stage_6:I_0\n\n\n\n\n\nTensor_6_0\n\n[32, 32, 256, 1]\nfloat32\n\n\n\nStage_6:O_0->Tensor_6_0\n\n\n\n\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "graph = tedd.viz_dataflow_graph(s, show_svg=True)\n", "display_svg(graph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可保存图到本地:\n", "\n", "```python\n", "tedd.viz_dataflow_graph(s, dot_file_path=\"/tmp/dfg.dot\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "第一个是数据流图。每个节点表示一个阶段,中间显示名称和 memory scope,两侧显示 inputs/outputs 信息。边表示节点的依赖关系。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": "\n\nSchedule Tree\n\n\ncluster_legend\n\nLegend\n\n\n\nlegend\n\n\n\n\nkDataPar\n\n\n\n\nkThreadIndex\n\n\n\n\nkCommReduce\n\n\n\n\nkOrdered\n\n\n\n\nkOpaque\n\n\n\n\nkUnrolled\n\n\n\n\nkVectorized\n\n\n\n\nkParallelized\n\n\n\n\nkTensorized\n\n\n\nStage_0\n\n\nA\nScope: \n\n\n\nROOT\n\nROOT\n\n\n\nStage_0->ROOT\n\n\n\n\n\nStage_1\n\n\npad_temp\nScope: \n\n\n0\n\n\ni0: int32(kDataPar)\nrange(min=0, ext=34)\n\n\n1\n\n\ni1: int32(kDataPar)\nrange(min=0, ext=34)\n\n\n2\n\n\ni2: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n3\n\n\ni3: int32(kDataPar)\nrange(min=0, ext=1)\n\n[tir.if_then_else(((((i0 >= 1)\n && (i0 < 33)) && (i1 >= 1)) &\n& (i1 < 33)), A[(i0 - 1), (i1 \n- 1), i2, i3], 0f)]\n\n\n\nStage_1->ROOT\n\n\n\n\n\nStage_2\n\n\nW\nScope: \n\n\n\nStage_2->ROOT\n\n\n\n\n\nStage_3\n\n\nconv2d_hwcn\nScope: \n\n\n0\n\n\nyy: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n1\n\n\nxx: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n2\n\n\nff: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n3\n\n\nnn: int32(kDataPar)\nrange(min=0, ext=1)\n\n\n4\n\n\nrx: int32(kCommReduce)\nrange(min=0, ext=3)\n\n\n5\n\n\nry: int32(kCommReduce)\nrange(min=0, ext=3)\n\n\n6\n\n\nrc: int32(kCommReduce)\nrange(min=0, ext=256)\n\n[reduce(combiner=comm_reducer(\nresult=[(x + y)], lhs=[x], rhs\n=[y], identity_element=[0f]), \nsource=[(pad_temp[(yy + ry), (\nxx + rx), rc, nn]*W[ry, rx, rc\n, ff])], init=[], axis=[iter_v\nar(rx, range(min=0, ext=3)), i\nter_var(ry, range(min=0, ext=3\n)), iter_var(rc, range(min=0, \next=256))], where=(bool)1, val\nue_index=0)]\n\n\n\nStage_3->ROOT\n\n\n\n\n\nStage_4\n\n\nbias\nScope: \n\n\n\nStage_4->ROOT\n\n\n\n\n\nStage_5\n\n\nT_add\nScope: \n\n\n0\n\n\nax0: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n1\n\n\nax1: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n2\n\n\nax2: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n3\n\n\nax3: int32(kDataPar)\nrange(min=0, ext=1)\n\n[(conv2d_hwcn[ax0, ax1, ax2, a\nx3] + bias[0, ax2, ax3])]\n\n\n\nStage_5->ROOT\n\n\n\n\n\nStage_6\n\n\ncompute\nScope: \n\n\n0\n\n\ni0: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n1\n\n\ni1: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n2\n\n\ni2: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n3\n\n\ni3: int32(kDataPar)\nrange(min=0, ext=1)\n\n[max(T_add[i0, i1, i2, i3], 0f\n)]\n\n\n\nStage_6->ROOT\n\n\n\n\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tree = tedd.viz_schedule_tree(s, show_svg=True)\n", "# tedd.viz_schedule_tree(s, dot_file_path=\"/tmp/scheduletree.dot\")\n", "display_svg(tree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "刚刚渲染了调度树图。您可能会注意到关于范围不可用的警告。\n", "\n", "该消息还建议调用 `normalize()` 来推断范围信息。鼓励您比较 `normalize()` 前后的图表,以了解其影响。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": "\n\nSchedule Tree\n\n\ncluster_legend\n\nLegend\n\n\n\nlegend\n\n\n\n\nkDataPar\n\n\n\n\nkThreadIndex\n\n\n\n\nkCommReduce\n\n\n\n\nkOrdered\n\n\n\n\nkOpaque\n\n\n\n\nkUnrolled\n\n\n\n\nkVectorized\n\n\n\n\nkParallelized\n\n\n\n\nkTensorized\n\n\n\nStage_0\n\n\nA\nScope: \n\n\n\nROOT\n\nROOT\n\n\n\nStage_0->ROOT\n\n\n\n\n\nStage_1\n\n\npad_temp\nScope: \n\n[tir.if_then_else(((((i0 >= 1)\n && (i0 < 33)) && (i1 >= 1)) &\n& (i1 < 33)), A[(i0 - 1), (i1 \n- 1), i2, i3], 0f)]\n\n\n\nStage_1->ROOT\n\n\n\n\n\nStage_2\n\n\nW\nScope: \n\n\n\nStage_2->ROOT\n\n\n\n\n\nStage_3\n\n\nconv2d_hwcn\nScope: \n\n[reduce(combiner=comm_reducer(\nresult=[(x + y)], lhs=[x], rhs\n=[y], identity_element=[0f]), \nsource=[(pad_temp[(yy + ry), (\nxx + rx), rc, nn]*W[ry, rx, rc\n, ff])], init=[], axis=[iter_v\nar(rx, range(min=0, ext=3)), i\nter_var(ry, range(min=0, ext=3\n)), iter_var(rc, range(min=0, \next=256))], where=(bool)1, val\nue_index=0)]\n\n\n\nStage_3->ROOT\n\n\n\n\n\nStage_4\n\n\nbias\nScope: \n\n\n\nStage_4->ROOT\n\n\n\n\n\nStage_5\n\n\nT_add\nScope: \n\n[(conv2d_hwcn[ax0, ax1, ax2, a\nx3] + bias[0, ax2, ax3])]\n\n\n\nStage_5->ROOT\n\n\n\n\n\nStage_6\n\n\ncompute\nScope: \n\n[max(T_add[i0, i1, i2, i3], 0f\n)]\n\n\n\nStage_6->ROOT\n\n\n\n\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s = s.normalize()\n", "# tedd.viz_schedule_tree(s, dot_file_path=\"/tmp/scheduletree2.dot\")\n", "tree = tedd.viz_schedule_tree(s, show_svg = True)\n", "display_svg(tree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "现在,仔细看看第二个调度树。ROOT 下的每个块代表一个阶段。阶段名称显示在顶部行,计算显示在底部行。中间的行是 IterVars,外部越高,内部越低。\n", "\n", "IterVar 行包含它的索引、名称、类型和其他可选信息。以 W.shared 为例。第一行告诉它的名称 \"W.shared\" 和内存作用域 \"Shared\"。它的计算是: `W(ax0, ax1, ax2, ax3)`。它最外层的循环 IterVar 是 ax0.ax1.fused.ax2.fused.ax3.fused.outer,kDataPar 的索引为 0,绑定到 threadIdx.y 和 range(min=0, ext=8)。\n", "\n", "您还可以使用图例中所示的索引框颜色来告诉 IterVar 类型。\n", "\n", "如果一个阶段没有 compute_at 的任何其他阶段,它就有一条直接到 ROOT 节点的边。否则,它有一条边指向它所附加的 IterVar,例如 W.shared 附加到 rx.outer 中间计算阶段。\n", "\n", "```{note}\n", ":class: alert alert-info\n", "\n", "根据定义,itervar 是内部节点,计算是调度树的叶子节点。省略了 IterVars 之间的边和阶段内的计算,使每个阶段成为块,以提高可读性。\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": "\n\n\n\n\n\nIterVar Relationship Graph\n\n\ncluster_legend\n\nLegend\n\n\ncluster_Stage_0\n\nA\n\n\ncluster_Stage_1\n\npad_temp\n\n\ncluster_Stage_2\n\nW\n\n\ncluster_Stage_3\n\nconv2d_hwcn\n\n\ncluster_Stage_4\n\nbias\n\n\ncluster_Stage_5\n\nT_add\n\n\ncluster_Stage_6\n\ncompute\n\n\n\nlegend\n\n\n\n\nkDataPar\n\n\n\n\nkThreadIndex\n\n\n\n\nkCommReduce\n\n\n\n\nkOrdered\n\n\n\n\nkOpaque\n\n\n\n\nkUnrolled\n\n\n\n\nkVectorized\n\n\n\n\nkParallelized\n\n\n\n\nkTensorized\n\n\n\n\nIterVar_1_0\n\n\n-1\n\n\ni0: int32(kDataPar)\nrange(min=0, ext=34)\n\n\n\nIterVar_1_1\n\n\n-1\n\n\ni1: int32(kDataPar)\nrange(min=0, ext=34)\n\n\n\nIterVar_1_2\n\n\n-1\n\n\ni2: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n\nIterVar_1_3\n\n\n-1\n\n\ni3: int32(kDataPar)\nrange(min=0, ext=1)\n\n\n\n\nIterVar_3_0\n\n\n-1\n\n\nyy: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n\nIterVar_3_1\n\n\n-1\n\n\nxx: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n\nIterVar_3_2\n\n\n-1\n\n\nff: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n\nIterVar_3_3\n\n\n-1\n\n\nnn: int32(kDataPar)\nrange(min=0, ext=1)\n\n\n\nIterVar_3_4\n\n\n-1\n\n\nrx: int32(kCommReduce)\nrange(min=0, ext=3)\n\n\n\nIterVar_3_5\n\n\n-1\n\n\nry: int32(kCommReduce)\nrange(min=0, ext=3)\n\n\n\nIterVar_3_6\n\n\n-1\n\n\nrc: int32(kCommReduce)\nrange(min=0, ext=256)\n\n\n\n\nIterVar_5_0\n\n\n-1\n\n\nax0: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n\nIterVar_5_1\n\n\n-1\n\n\nax1: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n\nIterVar_5_2\n\n\n-1\n\n\nax2: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n\nIterVar_5_3\n\n\n-1\n\n\nax3: int32(kDataPar)\nrange(min=0, ext=1)\n\n\n\nIterVar_6_0\n\n\n-1\n\n\ni0: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n\nIterVar_6_1\n\n\n-1\n\n\ni1: int32(kDataPar)\nrange(min=0, ext=32)\n\n\n\nIterVar_6_2\n\n\n-1\n\n\ni2: int32(kDataPar)\nrange(min=0, ext=256)\n\n\n\nIterVar_6_3\n\n\n-1\n\n\ni3: int32(kDataPar)\nrange(min=0, ext=1)\n\n\n\n" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from graphviz import Source\n", "# tedd.viz_itervar_relationship_graph(s, dot_file_path=\"/tmp/itervar.dot\")\n", "dot_string = tedd.viz_itervar_relationship_graph(s, output_dot_string=True)\n", "src = Source(dot_string)\n", "display_svg(src)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后一个是迭代关系图(IterVar Relationship Graph)。每个子图表示一个阶段,并包含 IterVar 节点和变换节点。例如,W.shared 有三个 split 节点和三个 fuse 节点。其余的是 IterVar 节点,其格式与 Schedule Trees 中的 IterVar 行相同。Root itervar 是那些不受任何变换节点驱动的迭代器,例如 ax0; 叶 IterVars 不驱动任何变换节点,并且具有非负索引,如 ax0.ax1.fused.ax2.fused.ax3.fused.outer 索引为 0。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 小结\n", "\n", "本教程演示了 TEDD 的用法。使用一个用 TOPI 构建的示例来显示底层的调度。您还可以在任何调度原语之前和之后使用它来检查其效果。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.4 ('mxnetx')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "vscode": { "interpreter": { "hash": "aa67ff675248b5ab29dcd2f00c1422844307085c8ca7c8ce7eddecd21b9c2975" } } }, "nbformat": 4, "nbformat_minor": 0 }