mirror of https://github.com/THUDM/ChatGLM2-6B
Update README for NPU inference
parent
921d7e9adc
commit
a2065a4f2e
14
README.md
14
README.md
|
@ -322,6 +322,20 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to(
|
|||
|
||||
在 Mac 上进行推理也可以使用 [ChatGLM.cpp](https://github.com/li-plus/chatglm.cpp)
|
||||
|
||||
### NPU 部署
|
||||
|
||||
如果你拥有华为昇腾 Ascend 硬件,可以使用 NPU 后端运行 ChatGLM2-6B, 需按照如下步骤安装依赖:
|
||||
|
||||
```shell
|
||||
pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
|
||||
pip install torch_npu==2.1.0
|
||||
```
|
||||
|
||||
同时,模型加载修改后端:
|
||||
```python
|
||||
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu')
|
||||
```
|
||||
|
||||
### 多卡部署
|
||||
如果你有多张 GPU,但是每张 GPU 的显存大小都不足以容纳完整的模型,那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`,然后通过如下方法加载模型:
|
||||
```python
|
||||
|
|
13
README_EN.md
13
README_EN.md
|
@ -241,6 +241,19 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to(
|
|||
|
||||
Loading a FP16 ChatGLM-6B model requires about 13GB of memory. Machines with less memory (such as a MacBook Pro with 16GB of memory) will use the virtual memory on the hard disk when there is insufficient free memory, resulting in a serious slowdown in inference speed.
|
||||
|
||||
### NPU Deployment
|
||||
|
||||
If your device is Ascend, it is possible to use the NPU backend to run ChatGLM-6B on Ascend device. First, you need to install torch and torch_npu:
|
||||
```shell
|
||||
pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
|
||||
pip install torch_npu==2.1.0
|
||||
```
|
||||
|
||||
Then you need to change the code to load model to NPU backend:
|
||||
```python
|
||||
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu')
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
The code of this repository is licensed under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). The use of the ChatGLM2-6B model weights is subject to the [Model License](MODEL_LICENSE). ChatGLM2-6B weights are **completely open** for academic research, and **free commercial use** is also allowed after completing the [questionnaire](https://open.bigmodel.cn/mla/form).
|
||||
|
|
Loading…
Reference in New Issue