From 7410cacd911dd73c94b8e68932ffc50ea4f6acc3 Mon Sep 17 00:00:00 2001
From: duzx16 <zx-du20@mails.tsinghua.edu.cn>
Date: Wed, 12 Apr 2023 23:54:55 +0800
Subject: [PATCH] Remove qe model

---
 README_en.md | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/README_en.md b/README_en.md
index 1a56c39..632a22a 100644
--- a/README_en.md
+++ b/README_en.md
@@ -140,11 +140,6 @@ Model quantization brings a certain performance decline. After testing, ChatGLM-
 model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
 ```
 
-**[2023/03/24]** We further provide an embedding-quantized model whose model parameters only cost 4.3GB GPU memory
-```python
-model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
-```
-
 ### CPU Deployment
 
 If your computer is not equipped with GPU, you can also conduct inference on CPU, but the inference speed is slow (and taking about 32GB of memory):