diff --git a/PROJECT.md b/PROJECT.md
index c529b1e..ce9a91c 100644
--- a/PROJECT.md
+++ b/PROJECT.md
@@ -10,6 +10,8 @@
 * [JittorLLMs](https://github.com/Jittor/JittorLLMs)：最低3G显存或者没有显卡都可运行 ChatGLM-6B FP16， 支持Linux、windows、Mac部署
 * [ChatGLM-Finetuning](https://github.com/liucongg/ChatGLM-Finetuning)：基于ChatGLM-6B模型，进行下游具体任务微调，涉及Freeze、Lora、P-tuning等，并进行实验效果对比。
 * [InstructGLM](https://github.com/yanqiangmiffy/InstructGLM)：基于ChatGLM-6B进行指令学习，汇总开源中英文指令数据，基于Lora进行指令数据微调，开放了Alpaca、Belle微调后的Lora权重，修复web_demo重复问题
+* [ChatGLM-web](https://github.com/NCZkevin/chatglm-web)：基于FastAPI和Vue3搭建的ChatGLM演示网站(支持chatglm流式输出、前端调整模型参数、上下文选择、保存图片、知识库问答等功能) 
+* [glm-bot](https://github.com/initialencounter/glm-bot)：将ChatGLM接入Koishi可在各大聊天平台上调用ChatGLM
 
 以下是部分针对本项目的教程/文档：
-* [Windows部署文档](https://github.com/ZhangErling/ChatGLM-6B/blob/main/deployment_windows.md)
\ No newline at end of file
+* [Windows部署文档](https://github.com/ZhangErling/ChatGLM-6B/blob/main/deployment_windows.md)
diff --git a/README.md b/README.md
index 5c25575..baf7d97 100644
--- a/README.md
+++ b/README.md
@@ -150,11 +150,6 @@ model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).qu
 model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
 ```
 
-我们进一步提供了对Embedding量化后的模型，模型参数仅占用4.3 GB显存：
-```python
-model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
-```
-
 ### CPU 部署
 如果你没有 GPU 硬件的话，也可以在 CPU 上进行推理，但是推理速度会更慢。使用方法如下（需要大概 32GB 内存）
 ```python
diff --git a/README_en.md b/README_en.md
index 1a56c39..632a22a 100644
--- a/README_en.md
+++ b/README_en.md
@@ -140,11 +140,6 @@ Model quantization brings a certain performance decline. After testing, ChatGLM-
 model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
 ```
 
-**[2023/03/24]** We further provide an embedding-quantized model whose model parameters only cost 4.3GB GPU memory
-```python
-model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
-```
-
 ### CPU Deployment
 
 If your computer is not equipped with GPU, you can also conduct inference on CPU, but the inference speed is slow (and taking about 32GB of memory):
diff --git a/ptuning/README.md b/ptuning/README.md
index a86db16..ab91468 100644
--- a/ptuning/README.md
+++ b/ptuning/README.md
@@ -155,11 +155,11 @@ for k, v in prefix_state_dict.items():
     new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
 model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
 ```
+注意你可能需要将 `pre_seq_len` 改成你训练时的实际值。
 
 (2) 如果需要加载的是旧 Checkpoint（包含 ChatGLM-6B 以及 PrefixEncoder 参数），则直接加载整个 Checkpoint：
 
 ```python
-config = AutoConfig.from_pretrained(CHECKPOINT_PATH, trust_remote_code=True, pre_seq_len=128)
 model = AutoModel.from_pretrained(CHECKPOINT_PATH, config=config, trust_remote_code=True)
 ```
 
diff --git a/ptuning/main.py b/ptuning/main.py
index 2aa5ac3..193a60d 100644
--- a/ptuning/main.py
+++ b/ptuning/main.py
@@ -166,8 +166,8 @@ def main():
                 else:
                     prompt = ""
                     history = examples[history_column][i]
-                    for i, (old_query, response) in enumerate(history):
-                        prompt += "[Round {}]\n问：{}\n答：{}\n".format(i, old_query, response)
+                    for turn_idx, (old_query, response) in enumerate(history):
+                        prompt += "[Round {}]\n问：{}\n答：{}\n".format(turn_idx, old_query, response)
                     prompt += "[Round {}]\n问：{}\n答：".format(len(history), query)
                 inputs.append(prompt)
                 targets.append(examples[response_column][i])
@@ -200,8 +200,8 @@ def main():
                 else:
                     prompt = ""
                     history = examples[history_column][i]
-                    for i, (old_query, response) in enumerate(history):
-                        prompt += "[Round {}]\n问：{}\n答：{}\n".format(i, old_query, response)
+                    for turn_idx, (old_query, response) in enumerate(history):
+                        prompt += "[Round {}]\n问：{}\n答：{}\n".format(turn_idx, old_query, response)
                     prompt += "[Round {}]\n问：{}\n答：".format(len(history), query)
 
                 prompt = prefix + prompt