Update README

2023-03-26 15:26:38 +08:00 · 2023-03-26 15:26:38 +08:00 · d9c45f0286
parent 8826b947c3
commit d9c45f0286
2 changed files with 21 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -1,11 +1,5 @@
 # ChatGLM-6B

-## 修改介绍
-将模型加载到多张gpu卡中，根据gpu的数量自动分配平均的显存占用，需要安装accelerate
-```shell
-python -m pip install accelerate
-```
-请注意，仍然需要24GB的内存， 后续优化 TODO
 ## 介绍

 ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型，基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构，具有 62 亿参数。结合模型量化技术，用户可以在消费级的显卡上进行本地部署（INT4 量化级别下最低只需 6GB 显存）。
@ -167,6 +161,16 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).hal
 ```
 即可使用在 Mac 上使用 GPU 加速模型推理。

+### 多卡部署
+```shell
+pip install accelerate
+```
+```python
+from utils import load_mode_and_tokenizer
+
+model, tokenizer = load_mode_and_tokenizer("your local path", num_gpus=2)
+```
+即可将模型部署到多卡上进行推理。
 ## ChatGLM-6B 示例

 以下是一些使用 `web_demo.py` 得到的示例截图。更多 ChatGLM-6B 的可能，等待你来探索发现！
--- a/README_en.md
+++ b/README_en.md
@ -1,12 +1,5 @@
 # ChatGLM-6B

-## Modification 
-Load the model into multiple GPUs and automatically allocate the average memory usage according to the number of GPUs.
-```shell
-python -m pip install accelerate
-```
-Please note that 24GB of cpu memory is still required. TODO optimization.”
-
 ## Introduction

 ChatGLM-6B is an open bilingual language model based on [General Language Model (GLM)](https://github.com/THUDM/GLM) framework, with 6.2 billion parameters. With the quantization technique, users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level).
@ -156,6 +149,17 @@ model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=Tru

 **For Mac users**: if your encounter the error `RuntimeError: Unknown platform: darwin`, please refer to this [Issue](https://github.com/THUDM/ChatGLM-6B/issues/6#issuecomment-1470060041). 

+### Multi-GPU Deployment
+
+```shell
+pip install accelerate
+```
+```python
+from utils import load_mode_and_tokenizer
+
+model, tokenizer = load_mode_and_tokenizer("your local path", num_gpus=2)
+```
+
 ## ChatGLM-6B Examples

 The following are some Chinese examples with `web_demo.py`. Welcome to explore more possibility with ChatGLM-6B.