diff --git a/README.md b/README.md index c9cce80..49b5797 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,5 @@ # ChatGLM-6B -## 修改介绍 -将模型加载到多张gpu卡中,根据gpu的数量自动分配平均的显存占用,需要安装accelerate -```shell -python -m pip install accelerate -``` -请注意,仍然需要24GB的内存, 后续优化 TODO ## 介绍 ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型,基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构,具有 62 亿参数。结合模型量化技术,用户可以在消费级的显卡上进行本地部署(INT4 量化级别下最低只需 6GB 显存)。 @@ -167,6 +161,16 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).hal ``` 即可使用在 Mac 上使用 GPU 加速模型推理。 +### 多卡部署 +```shell +pip install accelerate +``` +```python +from utils import load_mode_and_tokenizer + +model, tokenizer = load_mode_and_tokenizer("your local path", num_gpus=2) +``` +即可将模型部署到多卡上进行推理。 ## ChatGLM-6B 示例 以下是一些使用 `web_demo.py` 得到的示例截图。更多 ChatGLM-6B 的可能,等待你来探索发现! diff --git a/README_en.md b/README_en.md index c8e48a4..9a4b220 100644 --- a/README_en.md +++ b/README_en.md @@ -1,12 +1,5 @@ # ChatGLM-6B -## Modification -Load the model into multiple GPUs and automatically allocate the average memory usage according to the number of GPUs. -```shell -python -m pip install accelerate -``` -Please note that 24GB of cpu memory is still required. TODO optimization.” - ## Introduction ChatGLM-6B is an open bilingual language model based on [General Language Model (GLM)](https://github.com/THUDM/GLM) framework, with 6.2 billion parameters. With the quantization technique, users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level). @@ -156,6 +149,17 @@ model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=Tru **For Mac users**: if your encounter the error `RuntimeError: Unknown platform: darwin`, please refer to this [Issue](https://github.com/THUDM/ChatGLM-6B/issues/6#issuecomment-1470060041). +### Multi-GPU Deployment + +```shell +pip install accelerate +``` +```python +from utils import load_mode_and_tokenizer + +model, tokenizer = load_mode_and_tokenizer("your local path", num_gpus=2) +``` + ## ChatGLM-6B Examples The following are some Chinese examples with `web_demo.py`. Welcome to explore more possibility with ChatGLM-6B.