转换模型权重¶

要运行MLC LLM的模型，需要将模型权重转换为 MLC 格式（例如 RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC）。本页指导通过使用 mlc_llm convert_weight 添加模型变体，它接受 Hugging Face 模型作为输入，并将其转换/量化为 MLC 兼容的权重。

具体来说，添加了 RedPjama-INCITE-Instruct-3B-v1，而 MLC 已经为 RedPjama-INCITE-Chat-3B-v1 提供了模型库，可以重用这个库。

这可以扩展到例如：

当 MLC 已经支持 Mistral 时，添加 OpenHermes-Mistral。
当 MLC 已经支持 Llama-2 时，添加 Llama-2-uncensored。

备注

在继续之前，请确保你已经按照安装 TVM Unity 编译器的说明安装了 TVM Unity，这是使用 MLC LLM 编译模型所需的后端。

请同时按照 CLI / Python API 中的说明获取 CLI 应用程序 / Python API，以便与编译后的模型进行聊天。

0. 验证安装 ¶

步骤 1. 验证 mlc_llm

使用 Python 包 mlc_llm 来编译模型。可以通过安装 MLC LLM Python 包安装该包，无论是从源码构建还是安装预构建的包。可以通过以下命令在命令行中验证 mlc_llm 的安装：

$ mlc_llm --help
# You should see help information with this line
usage: MLC LLM Command Line Interface. [-h] {compile,convert_weight,gen_config}

备注

如果遇到错误提示 command not found: mlc_llm，可以尝试运行 python -m mlc_llm --help。

步骤 2. 验证 TVM

要编译模型，你还需要按照安装 TVM Unity 编译器的说明安装 TVM Unity。这里通过命令行快速验证 tvm （完整验证请参阅验证 TVM 安装）：

$ python -c "import tvm; print(tvm.__file__)"
/some-path/lib/python3.11/site-packages/tvm/__init__.py

1. 从 Hugging Face 克隆并转换权重 ¶

你可以在 mlc-llm 仓库下，或者你自己的工作目录中操作。请注意，所有平台可以共享相同的编译/量化权重。有关 convert_weight 的详细说明，请参阅编译命令规范。

# Create directory
mkdir -p dist/models && cd dist/models
# Clone HF weights
git lfs install
git clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1
cd ../..
# Convert weight
mlc_llm convert_weight ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
    --quantization q4f16_1 \
    -o dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC

2. 生成 MLC Chat 配置文件 ¶

使用 mlc_llm gen_config 命令生成 mlc-chat-config.json 配置文件并处理分词器。有关 gen_config 的具体说明，请参阅编译命令规范。

mlc_llm gen_config ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
    --quantization q4f16_1 --conv-template redpajama_chat \
    -o dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/

备注

文件 mlc-chat-config.json 在模型编译和运行时聊天中都至关重要。这里仅关注后者（运行时聊天）的情况。

你可以 选择性地 自定义 dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/mlc-chat-config.json 文件（查看自定义 MLC Chat 配置获取更详细的说明）。你也可以直接使用默认配置。

conversation_template 目录包含了 MLC 提供的完整对话模板列表。如果你添加的模型需要新的对话模板，你需要自行添加。可以参考这个 PR 作为示例。不过，添加自定义模板需要你从源码构建 mlc_llm，以便运行时能够识别它。

至此，你应该已经拥有以下文件。

~/mlc-llm > ls dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC
    mlc-chat-config.json                             # ===> the chat config
    ndarray-cache.json                               # ===> the model weight info
    params_shard_0.bin                               # ===> the model weights
    params_shard_1.bin
    ...
    tokenizer.json                                   # ===> the tokenizer files
    tokenizer_config.json

（可选）3. 将权重上传至 Hugging Face ¶

你可以选择将已有的内容上传至 Hugging Face。

# First, please create a repository on Hugging Face.
# With the repository created, run
git lfs install
git clone https://huggingface.co/my-huggingface-account/my-redpajama3b-weight-huggingface-repo
cd my-redpajama3b-weight-huggingface-repo
cp path/to/mlc-llm/dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/* .
git add . && git commit -m "Add redpajama-3b instruct model weights"
git push origin main

这将生成类似于 RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC 的内容，但适用于 Instruct 模型而非 Chat 模型。

干得漂亮！你已经成功分发了你编译的模型。接下来，将讨论如何在应用程序中使用这些模型权重。

下载已分发的模型 ¶

你现在可以使用现有的 MLC 工具（如 chat/serve/package）来处理已转换的权重。

mlc_llm chat HF://my-huggingface-account/my-redpajama3b-weight-huggingface-repo

转换模型权重¶

0. 验证安装¶

1. 从 Hugging Face 克隆并转换权重¶

2. 生成 MLC Chat 配置文件¶

（可选）3. 将权重上传至 Hugging Face¶

下载已分发的模型¶