{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(sphx_glr_tutorial_auto_scheduler_matmul_x86)=\n", "# 使用自动调度优化运算\n", "\n", "**作者**: [Lianmin Zheng](https://github.com/merrymercy),[Chengfan Jia](https://github.com/jcf94/)\n", "\n", "在本教程中,我们将展示 TVM 的自动调度功能如何在不需要编写自定义模板的情况下找到最佳调度。\n", "\n", "与基于模板的 [AutoTVM](autotvm_matmul_x86) 不同,后者依赖于手动模板来定义搜索空间,而自动调度器不需要任何模板。\n", "\n", "用户只需要编写计算声明,而不需要任何调度命令或模板。自动调度器可以自动生成一个大的搜索空间,并在空间中找到一个好的调度。\n", "\n", "本教程中我们以矩阵乘法为例。\n", "\n", "```{hint}\n", ":class: alert alert-info\n", "\n", "请注意,本教程不能在 Windows 或最近版本的 MacOS 上运行。为了让它运行,你需要将本教程的主体包裹在一个 `if __name__ == \"__main__\":` 块中。\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import tvm\n", "from tvm import te, auto_scheduler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义矩阵乘法\n", "\n", "首先,我们定义一个带有偏置加法的矩阵乘法。注意,这使用了 TVM 张量表达式语言中的标准操作。主要的区别是在函数定义的顶部使用了 {any}`register_workload` 装饰器。该函数应该返回一个输入/输出张量的列表。从这些张量中,自动调度器可以得到整个计算图。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "@auto_scheduler.register_workload # 注意 auto_scheduler 装饰器\n", "def matmul_add(N, L, M, dtype):\n", " A = te.placeholder((N, L), name=\"A\", dtype=dtype)\n", " B = te.placeholder((L, M), name=\"B\", dtype=dtype)\n", " C = te.placeholder((N, M), name=\"C\", dtype=dtype)\n", "\n", " k = te.reduce_axis((0, L), name=\"k\")\n", " matmul = te.compute(\n", " (N, M),\n", " lambda i, j: te.sum(A[i, k] * B[k, j], axis=k),\n", " name=\"matmul\",\n", " attrs={\"layout_free_placeholders\": [B]}, # 启用张量 B 的自动布局转换\n", " )\n", " out = te.compute((N, M), lambda i, j: matmul[i, j] + C[i, j], name=\"out\")\n", " return [A, B, C, out]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 创建搜索任务\n", "\n", "在定义了函数之后,我们现在可以为 `auto_scheduler` 创建一个任务来进行搜索。我们指定这个矩阵乘法的特殊参数,在这个例子中,是对 $1024 \\times 1024$ 大小的正方形矩阵的乘法。然后我们使用 ` N=L=M=1024 and dtype=\"float32\"` 创建一个搜索任务。\n", "\n", "```{admonition} 用自定义目标提高性能\n", "为了使 TVM 能够充分利用特定的硬件平台,你需要手动指定你的 CPU 能力。例如:\n", "\n", "- 用 ``llvm -mcpu=core-avx2`` 替换下面的 ``llvm``,以启用 AVX2\n", "- 用 ``llvm -mcpu=skylake-avx512`` 替换下面的 ``llvm``,以启用 AVX-512\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computational DAG:\n", "A = PLACEHOLDER [1024, 1024]\n", "B = PLACEHOLDER [1024, 1024]\n", "matmul(i, j) += (A[i, k]*B[k, j])\n", "C = PLACEHOLDER [1024, 1024]\n", "out(i, j) = (matmul[i, j] + C[i, j])\n", "\n" ] } ], "source": [ "target = tvm.target.Target(\"llvm\")\n", "N = L = M = 1024\n", "task = tvm.auto_scheduler.SearchTask(func=matmul_add, args=(N, L, M, \"float32\"), target=target)\n", "\n", "# 检查计算图\n", "print(\"Computational DAG:\")\n", "print(task.compute_dag)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 为自动调度设置参数\n", "\n", "下一步,我们为自动调度设置参数。\n", "\n", "* `num_measure_trials` 是我们在搜索过程中可以使用的测量试验的数量。为了快速演示,我们在本教程中只做了 10 次试验。在实践中,1000 是一个很好的搜索收敛值。你可以根据你的时间预算做更多的试验。\n", "* 此外,我们使用 {any}`RecordToFile ` 来 log 测量记录到 `matmul.json` 文件中。这些测量记录可以用来查询历史最好的,恢复搜索,并在以后做更多的分析。\n", "* 查阅 {any}`TuningOptions ` 了解参数的更多信息。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "log_file = \"matmul.json\"\n", "tune_option = auto_scheduler.TuningOptions(\n", " num_measure_trials=10,\n", " measure_callbacks=[auto_scheduler.RecordToFile(log_file)],\n", " verbose=2,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 运行搜索\n", "\n", "现在我们把所有的输入准备好。很简单,不是吗?我们可以启动搜索,让自动调度发挥它的魔力。经过一些测量试验后,我们可以从日志文件中加载最佳调度并加以应用。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/media/pc/data/4tb/lxw/libs/anaconda3/envs/py38/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", " from pandas import MultiIndex, Int64Index\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------------------------------------\n", "------------------------------ [ Search ]\n", "----------------------------------------------------------------------\n", "Generate Sketches\t\t#s: 3\n", "Sample Initial Population\t#s: 2009\tfail_ct: 1\tTime elapsed: 0.74\n", "GA Iter: 0\tMax score: 0.9999\tMin score: 0.9383\t#Pop: 128\t#M+: 0\t#M-: 0\n", "GA Iter: 4\tMax score: 0.9999\tMin score: 0.9878\t#Pop: 128\t#M+: 1383\t#M-: 75\n", "EvolutionarySearch\t\t#s: 128\tTime elapsed: 2.38\n", "----------------------------------------------------------------------\n", "------------------------------ [ Measure ]\n", "----------------------------------------------------------------------\n", "Get 10 programs to measure:\n", "..........**********\n", "==================================================\n", "No: 1\tGFLOPS: 74.00 / 74.00\tresults: MeasureResult(cost:[0.0290], error_no:0, all_cost:0.80, Tstamp:1661835378.96)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@i.1@j.1@ (0,65536)\n", " matmul auto_unroll: 512\n", " for k.0 (0,1024)\n", " for i.2 (0,4)\n", " for i.3 (0,2)\n", " vectorize j.3 (0,2)\n", " matmul = ...\n", " for i.2 (0,8)\n", " vectorize j.2 (0,2)\n", " out = ...\n", "\n", "==================================================\n", "No: 2\tGFLOPS: 125.24 / 125.24\tresults: MeasureResult(cost:[0.0172], error_no:0, all_cost:0.94, Tstamp:1661835379.42)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@i.1@ (0,1024)\n", " for j.1 (0,4)\n", " for k.0 (0,32)\n", " for i.2 (0,4)\n", " for k.1 (0,32)\n", " vectorize j.3 (0,64)\n", " matmul = ...\n", " for i.2 (0,4)\n", " for j.2 (0,64)\n", " out = ...\n", "\n", "==================================================\n", "No: 3\tGFLOPS: 72.50 / 125.24\tresults: MeasureResult(cost:[0.0296], error_no:0, all_cost:1.31, Tstamp:1661835379.98)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@ (0,256)\n", " matmul auto_unroll: 512\n", " for i.1 (0,4)\n", " for j.1 (0,16)\n", " for k.0 (0,512)\n", " for j.2 (0,16)\n", " for k.1 (0,2)\n", " for i.3 (0,2)\n", " vectorize j.3 (0,2)\n", " matmul = ...\n", " for i.1 (0,8)\n", " for j.1 (0,512)\n", " out = ...\n", "\n", "==================================================\n", "No: 4\tGFLOPS: 69.32 / 125.24\tresults: MeasureResult(cost:[0.0310], error_no:0, all_cost:2.62, Tstamp:1661835380.53)\n", "==================================================\n", "Placeholder: A, B, C\n", "matmul auto_unroll: 512\n", "parallel i.0@j.0@i.1@j.1@ (0,256)\n", " for k.0 (0,16)\n", " for i.2 (0,4)\n", " for j.2 (0,8)\n", " for k.1 (0,64)\n", " for i.3 (0,4)\n", " for j.3 (0,32)\n", " matmul = ...\n", "parallel i (0,1024)\n", " for j (0,1024)\n", " out = ...\n", "\n", "==================================================\n", "No: 5\tGFLOPS: 39.57 / 125.24\tresults: MeasureResult(cost:[0.0543], error_no:0, all_cost:1.19, Tstamp:1661835381.00)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@ (0,128)\n", " matmul auto_unroll: 16\n", " for i.1 (0,2)\n", " for j.1 (0,2)\n", " for k.0 (0,32)\n", " for i.2 (0,128)\n", " for j.2 (0,4)\n", " for k.1 (0,32)\n", " for i.3 (0,4)\n", " matmul = ...\n", " for i.1 (0,1024)\n", " for j.1 (0,8)\n", " out = ...\n", "\n", "==================================================\n", "No: 6\tGFLOPS: 31.00 / 125.24\tresults: MeasureResult(cost:[0.0693], error_no:0, all_cost:1.17, Tstamp:1661835381.47)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0 (0,512)\n", " for j.0 (0,4)\n", " for j.1 (0,64)\n", " for k.0 (0,64)\n", " for i.2 (0,2)\n", " for j.2 (0,4)\n", " for k.1 (0,16)\n", " matmul = ...\n", "parallel i (0,1024)\n", " for j (0,1024)\n", " out = ...\n", "\n", "==================================================\n", "No: 7\tGFLOPS: 58.25 / 125.24\tresults: MeasureResult(cost:[0.0369], error_no:0, all_cost:0.78, Tstamp:1661835381.94)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@i.1@j.1@ (0,4096)\n", " for k.0 (0,32)\n", " for j.2 (0,128)\n", " for k.1 (0,32)\n", " vectorize j.3 (0,2)\n", " matmul = ...\n", "parallel i (0,1024)\n", " for j (0,1024)\n", " out = ...\n", "\n", "==================================================\n", "No: 8\tGFLOPS: 22.04 / 125.24\tresults: MeasureResult(cost:[0.0975], error_no:0, all_cost:0.87, Tstamp:1661835382.58)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@i.1@j.1@ (0,2048)\n", " for k.0 (0,256)\n", " for i.2 (0,256)\n", " for k.1 (0,4)\n", " vectorize j.3 (0,2)\n", " matmul = ...\n", "parallel i (0,1024)\n", " for j (0,1024)\n", " out = ...\n", "\n", "==================================================\n", "No: 9\tGFLOPS: 43.68 / 125.24\tresults: MeasureResult(cost:[0.0492], error_no:0, all_cost:7.61, Tstamp:1661835382.98)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@ (0,32)\n", " matmul auto_unroll: 512\n", " for i.1 (0,8)\n", " for j.1 (0,8)\n", " for k.0 (0,256)\n", " for i.2 (0,2)\n", " for j.2 (0,2)\n", " for k.1 (0,4)\n", " for i.3 (0,32)\n", " vectorize j.3 (0,4)\n", " matmul = ...\n", " for i.1 (0,512)\n", " for j.1 (0,64)\n", " out = ...\n", "\n", "==================================================\n", "No: 10\tGFLOPS: 24.96 / 125.24\tresults: MeasureResult(cost:[0.0861], error_no:0, all_cost:0.89, Tstamp:1661835383.52)\n", "==================================================\n", "Placeholder: A, B, C\n", "parallel i.0@j.0@ (0,8)\n", " matmul auto_unroll: 16\n", " for j.1 (0,4)\n", " for k.0 (0,16)\n", " for i.2 (0,128)\n", " for j.2 (0,64)\n", " for k.1 (0,64)\n", " for i.3 (0,2)\n", " vectorize j.3 (0,2)\n", " matmul = ...\n", " for i.1 (0,256)\n", " for j.1 (0,512)\n", " out = ...\n", "\n", "Time elapsed for measurement: 15.21 s\n", "----------------------------------------------------------------------\n", "------------------------------ [ Done ]\n", "----------------------------------------------------------------------\n" ] } ], "source": [ "# 运行 auto-tuning (search)\n", "task.tune(tune_option)\n", "# 应用最优 schedule\n", "sch, args = task.apply_best(log_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 检查优化后的调度\n", "\n", "我们可以 lower 调度,看看自动调度后的 IR。自动调度器正确地进行了优化,包括多级平铺(tiling)、布局转换(layout transformation)、并行化(parallelization)、矢量化(vectorization)、解卷(unrolling)和运算符融合(operator fusion)。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;5;129m@tvm\u001b[39m\u001b[38;5;129;01m.\u001b[39;00mscript\u001b[38;5;129;01m.\u001b[39;00mir_module\n", "\u001b[38;5;28;01mclass\u001b[39;00m \u001b[38;5;21;01mModule\u001b[39;00m:\n", " \u001b[38;5;129m@T\u001b[39m\u001b[38;5;129;01m.\u001b[39;00mprim_func\n", " \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mmain\u001b[39m(A: T\u001b[38;5;129;01m.\u001b[39;00mBuffer[\u001b[38;5;28m1048576\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m], B: T\u001b[38;5;129;01m.\u001b[39;00mBuffer[\u001b[38;5;28m1048576\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m], C: T\u001b[38;5;129;01m.\u001b[39;00mBuffer[\u001b[38;5;28m1048576\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m], out: T\u001b[38;5;129;01m.\u001b[39;00mBuffer[\u001b[38;5;28m1048576\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m]) \u001b[38;5;129;01m-\u001b[39;00m\u001b[38;5;129;01m>\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n", " \u001b[38;5;30;03m# function attr dict\u001b[39;00m\n", " T\u001b[38;5;129;01m.\u001b[39;00mfunc_attr({\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfrom_legacy_te_schedule\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28;01mTrue\u001b[39;00m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mglobal_symbol\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmain\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtir.noalias\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28;01mTrue\u001b[39;00m})\n", " T\u001b[38;5;129;01m.\u001b[39;00mpreflattened_buffer(A, [\u001b[38;5;28m1024\u001b[39m, \u001b[38;5;28m1024\u001b[39m], dtype\u001b[38;5;129;01m=\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m, data\u001b[38;5;129;01m=\u001b[39;00mA\u001b[38;5;129;01m.\u001b[39;00mdata)\n", " T\u001b[38;5;129;01m.\u001b[39;00mpreflattened_buffer(B, [\u001b[38;5;28m1024\u001b[39m, \u001b[38;5;28m1024\u001b[39m], dtype\u001b[38;5;129;01m=\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m, data\u001b[38;5;129;01m=\u001b[39;00mB\u001b[38;5;129;01m.\u001b[39;00mdata)\n", " T\u001b[38;5;129;01m.\u001b[39;00mpreflattened_buffer(C, [\u001b[38;5;28m1024\u001b[39m, \u001b[38;5;28m1024\u001b[39m], dtype\u001b[38;5;129;01m=\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m, data\u001b[38;5;129;01m=\u001b[39;00mC\u001b[38;5;129;01m.\u001b[39;00mdata)\n", " T\u001b[38;5;129;01m.\u001b[39;00mpreflattened_buffer(out, [\u001b[38;5;28m1024\u001b[39m, \u001b[38;5;28m1024\u001b[39m], dtype\u001b[38;5;129;01m=\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m, data\u001b[38;5;129;01m=\u001b[39;00mout\u001b[38;5;129;01m.\u001b[39;00mdata)\n", " \u001b[38;5;30;03m# body\u001b[39;00m\n", " auto_scheduler_layout_transform \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mallocate([\u001b[38;5;28m1048576\u001b[39m], \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mglobal\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", " \u001b[38;5;28;01mfor\u001b[39;00m ax0_ax1_fused_ax2_fused \u001b[38;5;28;01min\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mparallel(\u001b[38;5;28m16\u001b[39m):\n", " \u001b[38;5;28;01mfor\u001b[39;00m ax4, ax5, ax6, ax7 \u001b[38;5;28;01min\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mgrid(\u001b[38;5;28m256\u001b[39m, \u001b[38;5;28m8\u001b[39m, \u001b[38;5;28m4\u001b[39m, \u001b[38;5;28m8\u001b[39m):\n", " auto_scheduler_layout_transform[ax0_ax1_fused_ax2_fused \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m65536\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax4 \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m256\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax5 \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m32\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax6 \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m8\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax7] \u001b[38;5;129;01m=\u001b[39;00m B[ax4 \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m4096\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax6 \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m1024\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax0_ax1_fused_ax2_fused \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m64\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax5 \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m8\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m ax7]\n", " \u001b[38;5;28;01mfor\u001b[39;00m i_outer_outer_j_outer_outer_fused \u001b[38;5;28;01min\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mparallel(\u001b[38;5;28m512\u001b[39m):\n", " matmul \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mallocate([\u001b[38;5;28m256\u001b[39m], \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfloat32\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mglobal\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", " \u001b[38;5;28;01mfor\u001b[39;00m i_outer_inner, j_outer_inner \u001b[38;5;28;01min\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mgrid(\u001b[38;5;28m4\u001b[39m, \u001b[38;5;28m2\u001b[39m):\n", " matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(T\u001b[38;5;129;01m.\u001b[39;00mfloat32(\u001b[38;5;28m0\u001b[39m), \u001b[38;5;28m8\u001b[39m)\n", " \u001b[38;5;28;01mfor\u001b[39;00m k_outer \u001b[38;5;28;01min\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mserial(\u001b[38;5;28m256\u001b[39m):\n", " cse_var_48: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m i_outer_outer_j_outer_outer_fused \u001b[38;5;129;01m%\u001b[39;00m \u001b[38;5;28m8\u001b[39m \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m131072\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m j_outer_inner \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m65536\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m k_outer \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m256\u001b[39m\n", " cse_var_47: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m i_outer_outer_j_outer_outer_fused \u001b[38;5;129;01m/\u001b[39;00m\u001b[38;5;129;01m/\u001b[39;00m \u001b[38;5;28m8\u001b[39m \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m16384\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m i_outer_inner \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m4096\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m k_outer \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m4\u001b[39m\n", " cse_var_46: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m96\u001b[39m\n", " cse_var_45: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m88\u001b[39m\n", " cse_var_44: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m80\u001b[39m\n", " cse_var_43: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m\n", " cse_var_42: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m72\u001b[39m\n", " cse_var_41: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m64\u001b[39m\n", " cse_var_40: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m56\u001b[39m\n", " cse_var_39: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m48\u001b[39m\n", " cse_var_38: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m40\u001b[39m\n", " cse_var_37: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m32\u001b[39m\n", " cse_var_36: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m248\u001b[39m\n", " cse_var_35: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m240\u001b[39m\n", " cse_var_34: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m24\u001b[39m\n", " cse_var_33: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m232\u001b[39m\n", " cse_var_32: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m224\u001b[39m\n", " cse_var_31: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m216\u001b[39m\n", " cse_var_30: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m208\u001b[39m\n", " cse_var_29: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m200\u001b[39m\n", " cse_var_28: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m192\u001b[39m\n", " cse_var_27: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m184\u001b[39m\n", " cse_var_26: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m176\u001b[39m\n", " cse_var_25: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m168\u001b[39m\n", " cse_var_24: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m160\u001b[39m\n", " cse_var_23: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m16\u001b[39m\n", " cse_var_22: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m152\u001b[39m\n", " cse_var_21: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m144\u001b[39m\n", " cse_var_20: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m136\u001b[39m\n", " cse_var_19: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m128\u001b[39m\n", " cse_var_18: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m120\u001b[39m\n", " cse_var_17: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m112\u001b[39m\n", " cse_var_16: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m104\u001b[39m\n", " cse_var_15: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m3075\u001b[39m\n", " cse_var_14: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m3074\u001b[39m\n", " cse_var_13: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m3073\u001b[39m\n", " cse_var_12: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m3072\u001b[39m\n", " cse_var_11: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m3\u001b[39m\n", " cse_var_10: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m2051\u001b[39m\n", " cse_var_9: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m2050\u001b[39m\n", " cse_var_8: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m2049\u001b[39m\n", " cse_var_7: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m2048\u001b[39m\n", " cse_var_6: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m2\u001b[39m\n", " cse_var_5: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m1027\u001b[39m\n", " cse_var_4: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m1026\u001b[39m\n", " cse_var_3: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m1025\u001b[39m\n", " cse_var_2: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m1024\u001b[39m\n", " cse_var_1: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m cse_var_47 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m1\u001b[39m\n", " matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_48:cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_48:cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_43:cse_var_43 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_43:cse_var_43 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_23:cse_var_23 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_23:cse_var_23 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m0\u001b[39m:\u001b[38;5;28m8\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_34:cse_var_34 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m64\u001b[39m:\u001b[38;5;28m72\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_34:cse_var_34 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_37:cse_var_37 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_37:cse_var_37 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_38:cse_var_38 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_38:cse_var_38 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_39:cse_var_39 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_39:cse_var_39 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m8\u001b[39m:\u001b[38;5;28m16\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_40:cse_var_40 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m72\u001b[39m:\u001b[38;5;28m80\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_40:cse_var_40 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_41:cse_var_41 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_41:cse_var_41 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_42:cse_var_42 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_42:cse_var_42 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_44:cse_var_44 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_44:cse_var_44 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m16\u001b[39m:\u001b[38;5;28m24\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_45:cse_var_45 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m80\u001b[39m:\u001b[38;5;28m88\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_45:cse_var_45 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_46:cse_var_46 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_46:cse_var_46 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_16:cse_var_16 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_16:cse_var_16 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_17:cse_var_17 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_17:cse_var_17 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m24\u001b[39m:\u001b[38;5;28m32\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_18:cse_var_18 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m88\u001b[39m:\u001b[38;5;28m96\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_18:cse_var_18 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_19:cse_var_19 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_19:cse_var_19 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_20:cse_var_20 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_20:cse_var_20 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_21:cse_var_21 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_21:cse_var_21 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m32\u001b[39m:\u001b[38;5;28m40\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_22:cse_var_22 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m96\u001b[39m:\u001b[38;5;28m104\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_22:cse_var_22 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_24:cse_var_24 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_24:cse_var_24 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_25:cse_var_25 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_25:cse_var_25 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_26:cse_var_26 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_26:cse_var_26 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m40\u001b[39m:\u001b[38;5;28m48\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_27:cse_var_27 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m104\u001b[39m:\u001b[38;5;28m112\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_27:cse_var_27 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_28:cse_var_28 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_28:cse_var_28 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_29:cse_var_29 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_29:cse_var_29 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_30:cse_var_30 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_30:cse_var_30 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m48\u001b[39m:\u001b[38;5;28m56\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_31:cse_var_31 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m112\u001b[39m:\u001b[38;5;28m120\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_31:cse_var_31 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_47], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_32:cse_var_32 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_2], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_32:cse_var_32 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_1], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_33:cse_var_33 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_3], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_33:cse_var_33 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_6], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_35:cse_var_35 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_4], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_35:cse_var_35 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m56\u001b[39m:\u001b[38;5;28m64\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_11], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_36:cse_var_36 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m120\u001b[39m:\u001b[38;5;28m128\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_5], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_36:cse_var_36 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_48:cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_48:cse_var_48 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_43:cse_var_43 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_43:cse_var_43 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_23:cse_var_23 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_23:cse_var_23 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m128\u001b[39m:\u001b[38;5;28m136\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_34:cse_var_34 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m192\u001b[39m:\u001b[38;5;28m200\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_34:cse_var_34 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_37:cse_var_37 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_37:cse_var_37 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_38:cse_var_38 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_38:cse_var_38 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_39:cse_var_39 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_39:cse_var_39 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m136\u001b[39m:\u001b[38;5;28m144\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_40:cse_var_40 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m200\u001b[39m:\u001b[38;5;28m208\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_40:cse_var_40 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_41:cse_var_41 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_41:cse_var_41 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_42:cse_var_42 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_42:cse_var_42 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_44:cse_var_44 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_44:cse_var_44 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m144\u001b[39m:\u001b[38;5;28m152\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_45:cse_var_45 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m208\u001b[39m:\u001b[38;5;28m216\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_45:cse_var_45 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_46:cse_var_46 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_46:cse_var_46 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_16:cse_var_16 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_16:cse_var_16 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_17:cse_var_17 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_17:cse_var_17 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m152\u001b[39m:\u001b[38;5;28m160\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_18:cse_var_18 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m216\u001b[39m:\u001b[38;5;28m224\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_18:cse_var_18 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_19:cse_var_19 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_19:cse_var_19 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_20:cse_var_20 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_20:cse_var_20 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_21:cse_var_21 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_21:cse_var_21 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m160\u001b[39m:\u001b[38;5;28m168\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_22:cse_var_22 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m224\u001b[39m:\u001b[38;5;28m232\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_22:cse_var_22 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_24:cse_var_24 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_24:cse_var_24 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_25:cse_var_25 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_25:cse_var_25 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_26:cse_var_26 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_26:cse_var_26 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m168\u001b[39m:\u001b[38;5;28m176\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_27:cse_var_27 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m232\u001b[39m:\u001b[38;5;28m240\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_27:cse_var_27 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_28:cse_var_28 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_28:cse_var_28 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_29:cse_var_29 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_29:cse_var_29 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_30:cse_var_30 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_30:cse_var_30 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m176\u001b[39m:\u001b[38;5;28m184\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_31:cse_var_31 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m240\u001b[39m:\u001b[38;5;28m248\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_31:cse_var_31 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_7], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_32:cse_var_32 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_12], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_32:cse_var_32 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_8], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_33:cse_var_33 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_13], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_33:cse_var_33 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_9], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_35:cse_var_35 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_14], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_35:cse_var_35 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m184\u001b[39m:\u001b[38;5;28m192\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_10], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_36:cse_var_36 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m=\u001b[39;00m matmul[\u001b[38;5;28m248\u001b[39m:\u001b[38;5;28m256\u001b[39m] \u001b[38;5;129;01m+\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mbroadcast(A[cse_var_15], \u001b[38;5;28m8\u001b[39m) \u001b[38;5;129;01m*\u001b[39;00m auto_scheduler_layout_transform[cse_var_36:cse_var_36 \u001b[38;5;129;01m+\u001b[39;00m \u001b[38;5;28m8\u001b[39m]\n", " \u001b[38;5;28;01mfor\u001b[39;00m i_inner, j_inner \u001b[38;5;28;01min\u001b[39;00m T\u001b[38;5;129;01m.\u001b[39;00mgrid(\u001b[38;5;28m4\u001b[39m, \u001b[38;5;28m64\u001b[39m):\n", " cse_var_49: T\u001b[38;5;129;01m.\u001b[39;00mint32 \u001b[38;5;129;01m=\u001b[39;00m i_outer_outer_j_outer_outer_fused \u001b[38;5;129;01m/\u001b[39;00m\u001b[38;5;129;01m/\u001b[39;00m \u001b[38;5;28m8\u001b[39m \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m16384\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m i_outer_inner \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m4096\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m i_inner \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m1024\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m i_outer_outer_j_outer_outer_fused \u001b[38;5;129;01m%\u001b[39;00m \u001b[38;5;28m8\u001b[39m \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m128\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m j_outer_inner \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m64\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m j_inner\n", " out[cse_var_49] \u001b[38;5;129;01m=\u001b[39;00m matmul[i_inner \u001b[38;5;129;01m*\u001b[39;00m \u001b[38;5;28m64\u001b[39m \u001b[38;5;129;01m+\u001b[39;00m j_inner] \u001b[38;5;129;01m+\u001b[39;00m C[cse_var_49]\n", " \n", "\n" ] } ], "source": [ "mod = tvm.lower(sch, args, simple_mode=True)\n", "mod.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 检查正确性并评估性能\n", "\n", "我们建立二进制文件,并检查其正确性(correctness)和性能(performance)。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Execution time of this operator: 11.307 ms\n" ] } ], "source": [ "func = tvm.build(sch, args, target)\n", "a_np = np.random.uniform(size=(N, L)).astype(np.float32)\n", "b_np = np.random.uniform(size=(L, M)).astype(np.float32)\n", "c_np = np.random.uniform(size=(N, M)).astype(np.float32)\n", "out_np = a_np.dot(b_np) + c_np\n", "\n", "dev = tvm.cpu()\n", "a_tvm = tvm.nd.array(a_np, device=dev)\n", "b_tvm = tvm.nd.array(b_np, device=dev)\n", "c_tvm = tvm.nd.array(c_np, device=dev)\n", "out_tvm = tvm.nd.empty(out_np.shape, device=dev)\n", "func(a_tvm, b_tvm, c_tvm, out_tvm)\n", "\n", "# Check results\n", "np.testing.assert_allclose(out_np, out_tvm.numpy(), rtol=1e-3)\n", "\n", "# Evaluate execution time.\n", "evaluator = func.time_evaluator(func.entry_name, dev, min_repeat_ms=500)\n", "print(\n", " \"Execution time of this operator: %.3f ms\"\n", " % (np.median(evaluator(a_tvm, b_tvm, c_tvm, out_tvm).results) * 1000)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用纪录文件\n", "\n", "在搜索过程中,所有的测量记录都被 log 到记录文件 `matmul.json`。这些测量记录可以用来重新应用搜索结果,恢复搜索,并进行其他分析。\n", "\n", "这里有一个例子,我们从一个文件中加载最佳调度,并打印出等效的 python 调度 API。这可以用于调试和学习自动调度的行为。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Equivalent python schedule:\n", "matmul_i, matmul_j, matmul_k = tuple(matmul.op.axis) + tuple(matmul.op.reduce_axis)\n", "out_i, out_j = tuple(out.op.axis) + tuple(out.op.reduce_axis)\n", "matmul_i_o_i, matmul_i_i = s[matmul].split(matmul_i, factor=2)\n", "matmul_i_o_o_i, matmul_i_o_i = s[matmul].split(matmul_i_o_i, factor=2)\n", "matmul_i_o_o_o, matmul_i_o_o_i = s[matmul].split(matmul_i_o_o_i, factor=4)\n", "matmul_j_o_i, matmul_j_i = s[matmul].split(matmul_j, factor=8)\n", "matmul_j_o_o_i, matmul_j_o_i = s[matmul].split(matmul_j_o_i, factor=8)\n", "matmul_j_o_o_o, matmul_j_o_o_i = s[matmul].split(matmul_j_o_o_i, factor=2)\n", "matmul_k_o, matmul_k_i = s[matmul].split(matmul_k, factor=4)\n", "s[matmul].reorder(matmul_i_o_o_o, matmul_j_o_o_o, matmul_i_o_o_i, matmul_j_o_o_i, matmul_k_o, matmul_i_o_i, matmul_j_o_i, matmul_k_i, matmul_i_i, matmul_j_i)\n", "out_i_o_i, out_i_i = s[out].split(out_i, factor=4)\n", "out_i_o_o, out_i_o_i = s[out].split(out_i_o_i, factor=4)\n", "out_j_o_i, out_j_i = s[out].split(out_j, factor=64)\n", "out_j_o_o, out_j_o_i = s[out].split(out_j_o_i, factor=2)\n", "s[out].reorder(out_i_o_o, out_j_o_o, out_i_o_i, out_j_o_i, out_i_i, out_j_i)\n", "s[matmul].compute_at(s[out], out_j_o_i)\n", "out_i_o_o_j_o_o_fused = s[out].fuse(out_i_o_o, out_j_o_o)\n", "s[out].parallel(out_i_o_o_j_o_o_fused)\n", "s[matmul].pragma(matmul_i_o_o_o, \"auto_unroll_max_step\", 512)\n", "s[matmul].pragma(matmul_i_o_o_o, \"unroll_explicit\", True)\n", "s[matmul].vectorize(matmul_j_i)\n", "\n" ] } ], "source": [ "print(\"Equivalent python schedule:\")\n", "print(task.print_best(log_file))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一个更复杂的例子是恢复搜索。在这种情况下,我们需要自己创建搜索策略和成本模型,并通过日志文件恢复搜索策略和成本模型(cost model)的状态。在下面的例子中,我们恢复了状态并做了更多的 5 次试验。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Resume search:\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/media/pc/data/4tb/lxw/libs/anaconda3/envs/py38/lib/python3.8/site-packages/xgboost/training.py:17: UserWarning: Old style callback is deprecated. See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html\n", " warnings.warn(f'Old style callback is deprecated. See: {link}', UserWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------------------------------------\n", "------------------------------ [ Call init-search callbacks ]\n", "----------------------------------------------------------------------\n", "SearchPolicy: Loaded 25 measurement records from matmul.json for [\"matmul_add\", 1024, 1024, 1024, \"float32\"]\n", "----------------------------------------------------------------------\n", "------------------------------ [ Search ]\n", "----------------------------------------------------------------------\n", "Generate Sketches\t\t#s: 3\n", "Sample Initial Population\t#s: 2023\tfail_ct: 0\tTime elapsed: 0.60\n", "GA Iter: 0\tMax score: 0.9999\tMin score: 0.9305\t#Pop: 128\t#M+: 0\t#M-: 0\n", "GA Iter: 4\tMax score: 1.0000\tMin score: 0.9863\t#Pop: 128\t#M+: 1372\t#M-: 74\n", "EvolutionarySearch\t\t#s: 128\tTime elapsed: 2.43\n", "----------------------------------------------------------------------\n", "------------------------------ [ Measure ]\n", "----------------------------------------------------------------------\n", "Get 5 programs to measure:\n", ".....*****\n", "Time elapsed for measurement: 8.20 s\n", "----------------------------------------------------------------------\n", "------------------------------ [ Done ]\n", "----------------------------------------------------------------------\n" ] } ], "source": [ "def resume_search(task, log_file):\n", " print(\"Resume search:\")\n", " cost_model = auto_scheduler.XGBModel()\n", " cost_model.update_from_file(log_file)\n", " search_policy = auto_scheduler.SketchPolicy(\n", " task, cost_model, init_search_callbacks=[auto_scheduler.PreloadMeasuredStates(log_file)]\n", " )\n", " tune_option = auto_scheduler.TuningOptions(\n", " num_measure_trials=5, measure_callbacks=[auto_scheduler.RecordToFile(log_file)]\n", " )\n", " task.tune(tune_option, search_policy=search_policy)\n", "\n", "resume_search(task, log_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 最后说明和总结\n", "\n", "在本教程中,我们已经展示了如何使用 TVM 自动调度器来自动优化矩阵乘法,而不需要指定搜索模板。它结束了一系列从张量表达式(Tensor Expression,简称 TE)语言开始的例子,展示了 TVM 如何优化计算操作。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('py38': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "vscode": { "interpreter": { "hash": "28558e8daad512806f5c536a1a04c119185f99f65b79002708a12162d02a79c7" } } }, "nbformat": 4, "nbformat_minor": 0 }