mirror of https://github.com/THUDM/ChatGLM-6B
Add API deployment
parent
b0c2b47f5e
commit
955d475079
25
README.md
25
README.md
|
@ -13,6 +13,8 @@ ChatGLM-6B 使用了和 ChatGPT 相似的技术,针对中文问答和对话进
|
|||
*Read this in [English](README_en.md).*
|
||||
|
||||
## 更新信息
|
||||
**[2023/03/23]** 增加API部署,感谢 [@LemonQu-GIT](https://github.com/LemonQu-GIT)
|
||||
|
||||
**[2023/03/19]** 增加流式输出接口 `stream_chat`,已更新到网页版和命令行 Demo。修复输出中的中文标点。增加量化后的模型 [ChatGLM-6B-INT4](https://huggingface.co/THUDM/chatglm-6b-int4)
|
||||
|
||||
|
||||
|
@ -78,7 +80,7 @@ python web_demo.py
|
|||
|
||||
程序会运行一个 Web Server,并输出地址。在浏览器中打开输出的地址即可使用。最新版 Demo 实现了打字机效果,速度体验大大提升。注意,由于国内 Gradio 的网络访问较为缓慢,启用 `demo.queue().launch(share=True, inbrowser=True)` 时所有网络会经过 Gradio 服务器转发,导致打字机体验大幅下降,现在默认启动方式已经改为 `share=False`,如有需要公网访问的需求,可以重新修改为 `share=True` 启动。
|
||||
|
||||
感谢[@AdamBear](https://github.com/AdamBear) 实现了基于 Streamlit 的网页版 Demo,运行方式见[#117](https://github.com/THUDM/ChatGLM-6B/pull/117).
|
||||
感谢 [@AdamBear](https://github.com/AdamBear) 实现了基于 Streamlit 的网页版 Demo,运行方式见[#117](https://github.com/THUDM/ChatGLM-6B/pull/117).
|
||||
|
||||
#### 命令行 Demo
|
||||
|
||||
|
@ -92,6 +94,27 @@ python cli_demo.py
|
|||
|
||||
程序会在命令行中进行交互式的对话,在命令行中输入指示并回车即可生成回复,输入`clear`可以清空对话历史,输入`stop`终止程序。
|
||||
|
||||
## API部署
|
||||
首先需要安装额外的依赖`pip install fastapi uvicorn`,然后运行仓库中的[api.py](api.py):
|
||||
```shell
|
||||
python api.py
|
||||
```
|
||||
默认部署在本地的8000端口,通过POST方法进行调用
|
||||
```shell
|
||||
curl -X POST "http://127.0.0.1:8000" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"prompt": "你好", "history": []}'
|
||||
```
|
||||
得到的返回值为
|
||||
```shell
|
||||
{
|
||||
"response":"你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。",
|
||||
"history":[["你好","你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。"]],
|
||||
"status":200,
|
||||
"time":"2023-03-23 21:38:40"
|
||||
}
|
||||
```
|
||||
|
||||
## 低成本部署
|
||||
### 模型量化
|
||||
默认情况下,模型以 FP16 精度加载,运行上述代码需要大概 13GB 显存。如果你的 GPU 显存有限,可以尝试以量化方式加载模型,使用方法如下:
|
||||
|
|
23
README_en.md
23
README_en.md
|
@ -9,6 +9,8 @@ ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese QA and dial
|
|||
Try the [online demo](https://huggingface.co/spaces/ysharma/ChatGLM-6b_Gradio_Streaming) on Huggingface Spaces.
|
||||
|
||||
## Update
|
||||
**[2023/03/23]** Add API deployment, thanks to [@LemonQu-GIT](https://github.com/LemonQu-GIT)
|
||||
|
||||
**[2023/03/19]** Add streaming output function `stream_chat`, already applied in web and CLI demo. Fix Chinese punctuations in output. Add quantized model [ChatGLM-6B-INT4](https://huggingface.co/THUDM/chatglm-6b-int4).
|
||||
|
||||
## Getting Started
|
||||
|
@ -86,6 +88,27 @@ python cli_demo.py
|
|||
|
||||
The command runs an interactive program in the shell. Type your instruction in the shell and hit enter to generate the response. Type `clear` to clear the dialogue history and `stop` to terminate the program.
|
||||
|
||||
## API Deployment
|
||||
First install the additional dependency `pip install fastapi uvicorn`. The run [api.py](api.py) in the repo.
|
||||
```shell
|
||||
python api.py
|
||||
```
|
||||
By default the api runs at the`8000`port of the local machine. You can call the API via
|
||||
```shell
|
||||
curl -X POST "http://127.0.0.1:8000" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"prompt": "你好", "history": []}'
|
||||
```
|
||||
The returned value is
|
||||
```shell
|
||||
{
|
||||
"response":"你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。",
|
||||
"history":[["你好","你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。"]],
|
||||
"status":200,
|
||||
"time":"2023-03-23 21:38:40"
|
||||
}
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Quantization
|
||||
|
|
Loading…
Reference in New Issue