CLI¶

MLC Chat CLI 是命令行工具，用于开箱即用地交互式运行 MLC 编译的大语言模型。

安装 MLC-LLM 包 ¶

Chat CLI 是 MLC-LLP 包的一部分。要使用 Chat CLI，首先按照此处的说明安装 MLC LLM。安装 MLC-LLM 包后，您可以运行以下命令来检查安装是否成功：

mlc_llm chat --help

如果安装成功，您应该会看到服务帮助信息。

快速上手 ¶

本节提供了使用 MLC-LLM Chat CLI 的快速入门指南。要启动 CLI 会话，请运行以下命令：

mlc_llm chat MODEL [--model-lib PATH-TO-MODEL-LIB]

其中 MODEL 是使用 MLC-LLM 构建过程编译后的模型文件夹。有关其他参数的信息可以在下一部分找到。

一旦 Chat CLI 准备就绪，您可以输入提示与模型进行交互。

You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out stats of last request (token/sec)
  /metrics            print out full engine metrics
  /reset              restart a fresh chat
  /set [overrides]    override settings in the generation config. For example,
                      `/set temperature=0.5;top_p=0.8;seed=23;max_tokens=100;stop=str1,str2`
                      Note: Separate stop words in the `stop` option with commas (,).
  Multi-line input: Use escape+enter to start a new line.

>>> What's the meaning of life?
The meaning of life is a philosophical and metaphysical question related to the purpose or significance of life or existence in general...

使用多 GPU 运行 CLI ¶

如果您希望启用张量并行以在多个 GPU 上运行大语言模型，请指定参数 --overrides "tensor_parallel_shards=$NGPU"。例如，

mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC --overrides "tensor_parallel_shards=2"

`mlc_llm chat` 命令 ¶

提供了 Chat CLI 接口的列表以供参考。

mlc_llm chat MODEL [--model-lib PATH-TO-MODEL-LIB] [--device DEVICE] [--overrides OVERRIDES]

MODEL 使用 MLC-LLM 构建过程编译后的模型文件夹。该参数: 可以是带有量化方案的模型名称（例如 Llama-2-7b-chat-hf-q4f16_1），也可以是模型文件夹的完整路径。在前一种情况下，我们将使用提供的名称在可能的路径中搜索模型文件夹。

--model-lib: 字段，用于指定要使用的模型库文件的完整路径（例如 .so 文件）。
--device: 运行设备的描述。用户应提供格式为 device_name:device_id 或 device_name 的字符串，其中 device_name 是 cuda、metal、vulkan、rocm、opencl、auto （自动检测本地设备）之一，device_id 是要运行的设备 ID。默认值为 auto，设备 ID 默认为 0。
--overrides: 模型配置覆盖。支持覆盖 context_window_size、prefill_chunk_size、sliding_window_size、attention_sink_size 和 tensor_parallel_shards。可以通过详细参数显式指定覆盖，例如 --overrides context_window_size=1024;prefill_chunk_size=128。

CLI¶

安装 MLC-LLM 包¶

快速上手¶

使用多 GPU 运行 CLI¶

mlc_llm chat 命令¶

安装 MLC-LLM 包 ¶

快速上手 ¶

使用多 GPU 运行 CLI ¶

`mlc_llm chat` 命令 ¶