Update README for NPU inference

pull/661/head
wangshuai09 2024-01-30 16:14:51 +08:00
parent 921d7e9adc
commit a2065a4f2e
2 changed files with 27 additions and 0 deletions

View File

@ -322,6 +322,20 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to(
在 Mac 上进行推理也可以使用 [ChatGLM.cpp](https://github.com/li-plus/chatglm.cpp)
### NPU 部署
如果你拥有华为昇腾 Ascend 硬件,可以使用 NPU 后端运行 ChatGLM2-6B, 需按照如下步骤安装依赖:
```shell
pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install torch_npu==2.1.0
```
同时,模型加载修改后端:
```python
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu')
```
### 多卡部署
如果你有多张 GPU但是每张 GPU 的显存大小都不足以容纳完整的模型那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`,然后通过如下方法加载模型:
```python

View File

@ -241,6 +241,19 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to(
Loading a FP16 ChatGLM-6B model requires about 13GB of memory. Machines with less memory (such as a MacBook Pro with 16GB of memory) will use the virtual memory on the hard disk when there is insufficient free memory, resulting in a serious slowdown in inference speed.
### NPU Deployment
If your device is Ascend, it is possible to use the NPU backend to run ChatGLM-6B on Ascend device. First, you need to install torch and torch_npu:
```shell
pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install torch_npu==2.1.0
```
Then you need to change the code to load model to NPU backend:
```python
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu')
```
## License
The code of this repository is licensed under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). The use of the ChatGLM2-6B model weights is subject to the [Model License](MODEL_LICENSE). ChatGLM2-6B weights are **completely open** for academic research, and **free commercial use** is also allowed after completing the [questionnaire](https://open.bigmodel.cn/mla/form).