From 7131d29f2d49ed984dba8aeeb88b5a16a83b07ab Mon Sep 17 00:00:00 2001 From: duzx16 Date: Thu, 6 Apr 2023 17:51:20 +0800 Subject: [PATCH] Add English readme --- README_en.md | 4 ++++ ptuning/README.md | 2 ++ 2 files changed, 6 insertions(+) diff --git a/README_en.md b/README_en.md index d5c05bb..da2b8dc 100644 --- a/README_en.md +++ b/README_en.md @@ -9,6 +9,8 @@ ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese QA and dial Try the [online demo](https://huggingface.co/spaces/ysharma/ChatGLM-6b_Gradio_Streaming) on Huggingface Spaces. ## Update +**[2023/03/31]** Added a parameter-efficient tuning implementation based on [P-Tuning-v2](https://github.com/THUDM/P-tuning-v2). The minimum INT4 quantization level only needs 7GB GPU memory is enough for model tuning. See [Parameter-efficient tuning method](ptuning/README.md) for details. + **[2023/03/23]** Add API deployment, thanks to [@LemonQu-GIT](https://github.com/LemonQu-GIT). Add embedding-quantized model [ChatGLM-6B-INT4-QE](https://huggingface.co/THUDM/chatglm-6b-int4-qe). Add support for GPU inference on Mac with Apple Silicon. **[2023/03/19]** Add streaming output function `stream_chat`, already applied in web and CLI demo. Fix Chinese punctuations in output. Add quantized model [ChatGLM-6B-INT4](https://huggingface.co/THUDM/chatglm-6b-int4). @@ -168,6 +170,8 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).hal ``` Then you can use GPU-accelerated model inference on Mac. +## Parameter-efficient Tuning +Parameter-efficient tuning based on [P-tuning v2](https://github.com/THUDM/P-tuning-v2). See [ptuning/README.md](ptuning/README.md) for details on how to use it. ## ChatGLM-6B Examples diff --git a/ptuning/README.md b/ptuning/README.md index ca1fc73..11ee326 100644 --- a/ptuning/README.md +++ b/ptuning/README.md @@ -3,6 +3,8 @@ 下面以 [ADGEN](https://aclanthology.org/D19-1321.pdf) (广告生成) 数据集为例介绍代码的使用方法。 +*Read this in [English](README_en.md).* + ## 软件依赖 运行微调需要4.27.1版本的`transformers`。除 ChatGLM-6B 的依赖之外,还需要按照以下依赖 ```