Merge branch 'main' of https://github.com/THUDM/ChatGLM-6B into main

pull/621/head
rainatam 2023-04-13 13:53:01 +08:00
commit 6fb0380847
5 changed files with 8 additions and 16 deletions

View File

@ -10,6 +10,8 @@
* [JittorLLMs](https://github.com/Jittor/JittorLLMs)最低3G显存或者没有显卡都可运行 ChatGLM-6B FP16 支持Linux、windows、Mac部署
* [ChatGLM-Finetuning](https://github.com/liucongg/ChatGLM-Finetuning)基于ChatGLM-6B模型进行下游具体任务微调涉及Freeze、Lora、P-tuning等并进行实验效果对比。
* [InstructGLM](https://github.com/yanqiangmiffy/InstructGLM)基于ChatGLM-6B进行指令学习汇总开源中英文指令数据基于Lora进行指令数据微调开放了Alpaca、Belle微调后的Lora权重修复web_demo重复问题
* [ChatGLM-web](https://github.com/NCZkevin/chatglm-web)基于FastAPI和Vue3搭建的ChatGLM演示网站(支持chatglm流式输出、前端调整模型参数、上下文选择、保存图片、知识库问答等功能)
* [glm-bot](https://github.com/initialencounter/glm-bot)将ChatGLM接入Koishi可在各大聊天平台上调用ChatGLM
以下是部分针对本项目的教程/文档:
* [Windows部署文档](https://github.com/ZhangErling/ChatGLM-6B/blob/main/deployment_windows.md)
* [Windows部署文档](https://github.com/ZhangErling/ChatGLM-6B/blob/main/deployment_windows.md)

View File

@ -150,11 +150,6 @@ model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).qu
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
```
我们进一步提供了对Embedding量化后的模型模型参数仅占用4.3 GB显存
```python
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
```
### CPU 部署
如果你没有 GPU 硬件的话,也可以在 CPU 上进行推理,但是推理速度会更慢。使用方法如下(需要大概 32GB 内存)
```python

View File

@ -140,11 +140,6 @@ Model quantization brings a certain performance decline. After testing, ChatGLM-
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
```
**[2023/03/24]** We further provide an embedding-quantized model whose model parameters only cost 4.3GB GPU memory
```python
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
```
### CPU Deployment
If your computer is not equipped with GPU, you can also conduct inference on CPU, but the inference speed is slow (and taking about 32GB of memory):

View File

@ -155,11 +155,11 @@ for k, v in prefix_state_dict.items():
new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
```
注意你可能需要将 `pre_seq_len` 改成你训练时的实际值。
(2) 如果需要加载的是旧 Checkpoint包含 ChatGLM-6B 以及 PrefixEncoder 参数),则直接加载整个 Checkpoint
```python
config = AutoConfig.from_pretrained(CHECKPOINT_PATH, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(CHECKPOINT_PATH, config=config, trust_remote_code=True)
```

View File

@ -166,8 +166,8 @@ def main():
else:
prompt = ""
history = examples[history_column][i]
for i, (old_query, response) in enumerate(history):
prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
for turn_idx, (old_query, response) in enumerate(history):
prompt += "[Round {}]\n问:{}\n答:{}\n".format(turn_idx, old_query, response)
prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
inputs.append(prompt)
targets.append(examples[response_column][i])
@ -200,8 +200,8 @@ def main():
else:
prompt = ""
history = examples[history_column][i]
for i, (old_query, response) in enumerate(history):
prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
for turn_idx, (old_query, response) in enumerate(history):
prompt += "[Round {}]\n问:{}\n答:{}\n".format(turn_idx, old_query, response)
prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
prompt = prefix + prompt