diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 6679939..288a6e2 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -33,26 +33,104 @@
 </div>
 
 <p align="center">
-    👋 加入我们的<a href="https://twitter.com/intern_lm" target="_blank">推特</a>、<a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=internwx" target="_blank">微信社区</a>
+    👋 加入我们的 <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> 和 <a href="https://github.com/InternLM/InternLM/assets/25839884/a6aad896-7232-4220-ac84-9e070c2633ce" target="_blank">微信社区</a>
 </p>
 
 ## 简介
 
-InternLM ，即书生·浦语大模型，包含面向实用场景的70亿参数基础模型与对话模型 （InternLM-7B）。模型具有以下特点：
+InternLM 是一个开源的轻量级训练框架，旨在支持大模型训练而无需大量的依赖。通过单一的代码库，它支持在拥有数千个 GPU 的大型集群上进行预训练，并在单个 GPU 上进行微调，同时实现了卓越的性能优化。在1024个 GPU 上训练时，InternLM 可以实现近90%的加速效率。
 
-- 使用上万亿高质量语料，建立模型超强知识体系；
-- 支持8k语境窗口长度，实现更长输入与更强推理体验；
-- 通用工具调用能力，支持用户灵活自助搭建流程；
+基于InternLM训练框架，我们已经发布了两个开源的预训练模型：InternLM-7B 和 InternLM-20B。
 
-提供了支持模型预训练的轻量级训练框架，无需安装大量依赖包，一套代码支持千卡预训练和单卡人类偏好对齐训练，同时实现了极致的性能优化，实现千卡训练下近90%加速效率。
+## 更新
 
-## 新闻
+[20230920] InternLM-20B 已发布，包括基础版和对话版。  
+[20230822] InternLM-7B-Chat v1.1 已发布，增加了代码解释器和函数调用能力。您可以使用 [Lagent](https://github.com/InternLM/lagent) 进行尝试。
 
-我们开源了 InternLM-Chat-7B v1.1。该模型能够调用代码解释器和工具插件。你可以在 [Lagent](https://github.com/InternLM/lagent) 中体验这些新功能。
 
-## InternLM-7B
+## Model Zoo
 
-### 性能评测
+我们的模型在三个平台上发布：Transformers、ModelScope 和 OpenXLab。
+
+| Model                     | Transformers                        | ModelScope                                                                                                                        | OpenXLab                                                                              |发布日期 |
+|---------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
+| **InternLM Chat 20B**     | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat)         | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary)         | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b)     | 2023-09-20   |
+| **InternLM 20B**          | [🤗internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b)                   | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary)                   | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b)          | 2023-09-20   |
+| **InternLM Chat 7B v1.1** | [🤗internlm/internlm-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1) | 2023-08-22   |
+| **InternLM 7B**           | [🤗internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b)                     | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary)                     | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b)           | 2023-07-06   |
+| **InternLM Chat 7B**      | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)           | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary)           | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b)      | 2023-07-06   |
+| **InternLM Chat 7B 8k**   | [🤗internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k)     | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)     | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k)   | 2023-07-06   |
+
+
+<details> 
+<summary> InternLM-20B </summary>
+
+#### 简介
+InternLM-20B 在超过 **2.3T** Tokens 包含高质量英文、中文和代码的数据上进行预训练，其中 Chat 版本还经过了 SFT 和 RLHF 训练，使其能够更好、更安全地满足用户的需求。  
+
+InternLM 20B 在模型结构上选择了深结构，InternLM-20B 的层数设定为60层，超过常规7B和13B模型所使用的32层或者40层。在参数受限的情况下，提高层数有利于提高模型的综合能力。此外，相较于InternLM-7B，InternLM-20B使用的预训练数据经过了更高质量的清洗，并补充了高知识密度和用于强化理解和推理能力的训练数据。因此，它在理解能力、推理能力、数学能力、编程能力等考验语言模型技术水平的方面都得到了显著提升。总体而言，InternLM-20B具有以下的特点： 
+- 优异的综合性能
+- 很强的工具调用功能
+- 支持16k语境长度（通过推理时外推）
+- 更好的价值对齐
+
+#### 性能对比
+
+在OpenCompass提出的5个能力维度上，InternLM-20B都取得很好的效果（粗体为13B-33B这个量级范围内，各项最佳成绩）
+
+| 能力维度 | Llama-13B | Llama2-13B | Baichuan2-13B | InternLM-20B | Llama-33B | Llama-65B | Llama2-70B |
+|----------|-----------|------------|---------------|--------------|-----------|-----------|------------|
+| 语言     | 42.5      | 47         | 47.5          | **55**           | 44.6      | 47.1      | 51.6       |
+| 知识     | 58.2      | 58.3       | 48.9          | 60.1         | **64**        | 66        | 67.7       |
+| 理解     | 45.5      | 50.9       | 58.1          | **67.3**         | 50.6      | 54.2      | 60.8       |
+| 推理     | 42.7      | 43.6       | 44.2          | **54.9**         | 46.4      | 49.8      | 55         |
+| 学科     | 37.3      | 45.2       | 51.8          | **62.5**         | 47.4      | 49.7      | 57.3       |
+| 总平均   | 43.8      | 47.3       | 49.4          | **59.2**         | 48.9      | 51.9      | 57.4       |
+
+下表在一些有重要影响力的典型数据集上比较了主流开源模型的表现
+
+|      | 评测集           | Llama-13B | Llama2-13B | Baichuan2-13B | InternLM-20B | Llama-33B | Llama-65B | Llama2-70B |
+|------|------------------|-----------|------------|---------------|--------------|-----------|-----------|------------|
+| 学科 | MMLU             | 47.73     | 54.99      | 59.55         | **62.05**        | 58.73     | 63.71     | 69.75      |
+|      | C-Eval (val)     | 31.83     | 41.4       | **59.01**         | 58.8         | 37.47     | 40.36     | 50.13      |
+|      | AGI-Eval         | 22.03     | 30.93      | 37.37         | **44.58**        | 33.53     | 33.92     | 40.02      |
+| 知识 | BoolQ            | 78.75     | 82.42      | 67            | **87.46**        | 84.43     | 86.61     | 87.74      |
+|      | TriviaQA         | 52.47     | 59.36      | 46.61         | 57.26        | **66.24**     | 69.79     | 70.71      |
+|      | NaturalQuestions | 20.17     | 24.85      | 16.32         | 25.15        | **30.89**     | 33.41     | 34.16      |
+| 理解 | CMRC             | 9.26      | 31.59      | 29.85         | **68.78**        | 14.17     | 34.73     | 43.74      |
+|      | CSL              | 55        | 58.75      | 63.12         | **65.62**        | 57.5      | 59.38     | 60         |
+|      | RACE (middle)    | 53.41     | 63.02      | 68.94         | **86.35**        | 64.55     | 72.35     | 81.55      |
+|      | RACE (high)      | 47.63     | 58.86      | 67.18         | **83.28**        | 62.61     | 68.01     | 79.93      |
+|      | XSum             | 20.37     | 23.37      | 25.23         | **35.54**        | 20.55     | 19.91     | 25.38      |
+| 推理 | WinoGrande       | 64.64     | 64.01      | 67.32         | **69.38**        | 66.85     | 69.38     | 69.77      |
+|      | BBH              | 37.93     | 45.62      | 48.98         | **52.51**        | 49.98     | 58.38     | 64.91      |
+|      | GSM8K            | 20.32     | 29.57      | **52.62**         | **52.62**        | 42.3      | 54.44     | 63.31      |
+|      | PIQA             | 79.71     | 79.76      | 78.07         | 80.25        | **81.34**     | 82.15     | 82.54      |
+| 编程 | HumanEval        | 14.02     | 18.9       | 17.07         | **25.61**        | 17.68     | 18.9      | 26.22      |
+|      | MBPP             | 20.6      | 26.8       | 30.8          | **35.6**         | 28.4      | 33.6      | 39.6       |
+
+总体而言，InternLM-20B 在综合能力上全面领先于13B量级的开源模型，同时在推理评测集上接近甚至超越Llama-65B的性能。
+
+- 评估结果来自 [OpenCompass 20230920](https://github.com/internLM/OpenCompass/)。
+- 由于 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代，评估数据可能存在数值上的差异，所以请参考 [OpenCompass](https://github.com/internLM/OpenCompass/) 的最新评估结果。
+
+</details>
+
+
+<details> 
+<summary> InternLM-7B </summary>
+
+#### 模型更新
+[20230822] 通过使用更丰富的SFT类型数据，InternLM-7B-Chat v1.1模型支持代码解释和函数调用。模型结构与代码没有任何变化，因此可以使用与InternLM-7B-Chat完全一样的方式使用更强大的InternLM-7B-Chat v1.1。
+
+#### 简介
+InternLM-7B 包含了一个拥有70亿参数的基础模型和一个为实际场景量身定制的对话模型。该模型具有以下特点：
+
+- 它利用数万亿的高质量令牌进行训练，建立了一个强大的知识库。
+- 它支持8k的上下文窗口长度，使得输入序列更长并增强了推理能力。
+- 它为用户提供了一个多功能的工具集，使用户能够灵活地构建自己的工作流程。
+
+#### 性能对比
 
 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测，部分评测结果如下表所示，欢迎访问[OpenCompass 榜单](https://opencompass.org.cn/rank)获取更多的评测结果。
 
@@ -72,27 +150,22 @@ InternLM ，即书生·浦语大模型，包含面向实用场景的70亿参数
 - 以上评测结果基于 [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) 获得（部分数据标注`*`代表数据来自原始论文），具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
 - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异，请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
 
-### Model Zoo
 
-当前通过 InternLM 训练的 InternLM 7B 和 InternLM 7B Chat 已经开源，我们提供两种格式的模型权重以供使用。除了使用 Transformers 格式加载模型之外，还可以通过 InternLM 加载以下格式的权重直接进行继续预训练或人类偏好对齐训练
-
-| 模型                 | InternLM 格式权重下载地址                                                                                                                      | Transformers 格式权重下载地址                    |
-| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ |
-| **InternLM 7B**      | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | [🤗internlm/intern-7b](https://huggingface.co/internlm/internlm-7b) |
-| **InternLM Chat 7B v1.1**    | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1)    | [🤗internlm/intern-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1)       |
-| **InternLM Chat 7B** | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [🤗internlm/intern-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
-| **InternLM Chat 7B 8k** | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | [🤗internlm/intern-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k)
 
 **局限性：** 尽管在训练过程中我们非常注重模型的安全性，尽力促使模型输出符合伦理和法律要求的文本，但受限于模型大小以及概率生成范式，模型可能会产生各种不符合预期的输出，例如回复内容包含偏见、歧视等有害内容，请勿传播这些内容。由于传播不良信息导致的任何后果，本项目不承担责任。
 
+</details>
+
+## 使用案例
+
 ### 通过 Transformers 加载
 
-通过以下的代码加载 InternLM 7B Chat 模型
+通过以下的代码从 Transformers 加载 InternLM 模型 （可修改模型名称替换不同的模型）
 
 ```python
 >>> from transformers import AutoTokenizer, AutoModelForCausalLM
->>> tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b-v1_1", trust_remote_code=True)
->>> model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b-v1_1", trust_remote_code=True).cuda()
+>>> tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True)
+>>> model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True).cuda()
 >>> model = model.eval()
 >>> response, history = model.chat(tokenizer, "你好", history=[])
 >>> print(response)
@@ -105,6 +178,24 @@ InternLM ，即书生·浦语大模型，包含面向实用场景的70亿参数
 3. 集中注意力：避免分心，集中注意力完成任务。关闭社交媒体和电子邮件通知，专注于任务，这将帮助您更快地完成任务，并减少错误的可能性。
 ```
 
+### 通过 ModelScope 加载 
+
+通过以下的代码从 ModelScope 加载 InternLM 模型 （可修改模型名称替换不同的模型）
+
+```python
+from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
+import torch
+model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b-v1_1', revision='v1.0.0')
+tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True,torch_dtype=torch.float16)
+model = AutoModelForCausalLM.from_pretrained(model_dir,device_map="auto",  trust_remote_code=True,torch_dtype=torch.float16)
+model = model.eval()
+response, history = model.chat(tokenizer, "hello", history=[])
+print(response)
+response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
+print(response)
+```
+
+
 ### 通过前端网页对话
 
 可以通过以下代码启动一个前端的界面来与 InternLM Chat 7B 模型进行交互
@@ -123,44 +214,25 @@ streamlit run web_demo.py
 
 我们使用 [LMDeploy](https://github.com/InternLM/LMDeploy) 完成 InternLM 的一键部署。
 
-```bash
-python3 -m pip install lmdeploy
-```
+1. 首先安装 LMDeploy:
 
-执行以下命令，可以在终端与 `internlm-chat-7b` 模型进行交互式对话，或者通过 WebUI 与它聊天。
+  ```
+  python3 -m pip install lmdeploy
+  ```
 
-```bash
-# 转换权重格式
-python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b
+2. 快速的部署命令如下：
 
-# 在终端进行交互式对话
-python3 -m lmdeploy.turbomind.chat ./workspace
+  ```
+  python3 -m lmdeploy.serve.turbomind.deploy InternLM-7B /path/to/internlm-7b/model hf
+  ```
 
-# 启动 gradio 服务
-python3 -m lmdeploy.serve.gradio.app ./workspace
-```
-以上过程中，LMDeploy 使用的是 FP16 的计算精度。
+3. 在导出模型后，你可以直接通过如下命令启动服务一个服务并和部署后的模型对话
 
-除了 FP16 精度，LMDeploy 还支持 `internlm-chat-7b` 4bit 权重模型推理。它不仅把模型的显存减少到 6G，大约只有 FP16 的 40%，更重要的是，经过 kernel 层面的极致优化，其推理性能在 A100-80G 上可达到 FP16 的 2.4 倍以上。
-
-以下是`internlm-chat-7b` 4bit 权重模型的部署方法。推理速度的 bechmark 请参考[这里](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/w4a16.md#%E6%8E%A8%E7%90%86%E9%80%9F%E5%BA%A6)
-
-```bash
-# download prequnantized internlm-chat-7b model from huggingface
-git-lfs install
-git clone https://huggingface.co/lmdeploy/llama2-chat-7b-w4
-
-# Convert the model's layout and store it in the default path, ./workspace.
-python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b ./llama2-chat-7b-w4 awq --group-size 128
-
-# inference lmdeploy's turbomind engine
-python3 -m lmdeploy.turbomind.chat ./workspace
-
-# serving with gradio
-python3 -m lmdeploy.serve.gradio.app ./workspace
-```
-LMDeploy 是涵盖了 LLM 任务的全套轻量化、部署和服务的工具箱。请参考 [部署教程](https://github.com/InternLM/LMDeploy) 了解 InternLM 的更多部署细节。
+  ```
+  python3 -m lmdeploy.serve.client {server_ip_addresss}:33337
+  ```
 
+[LMDeploy](https://github.com/InternLM/LMDeploy) 支持了 InternLM 部署的完整流程，请参考 [部署教程](https://github.com/InternLM/LMDeploy) 了解 InternLM 的更多部署细节。
 
 ## 微调&训练
 
diff --git a/README.md b/README.md
index 0097aa8..59b1f63 100644
--- a/README.md
+++ b/README.md
@@ -33,26 +33,100 @@
 </div>
 
 <p align="center">
-    👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a>
+    👋 join us on <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://github.com/InternLM/InternLM/assets/25839884/a6aad896-7232-4220-ac84-9e070c2633ce" target="_blank">WeChat</a>
 </p>
 
 ## Introduction
+InternLM is an open-sourced lightweight training framework aims to  support model pre-training without the need for extensive dependencies. With a single codebase, it supports pre-training on large-scale clusters with thousands of GPUs, and fine-tuning on a single GPU while achieving remarkable performance optimizations. InternLM achieves nearly 90% acceleration efficiency during training on 1024 GPUs.
 
-InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
+Based on the InternLM training framework, we have released two open-sourced pretrained model InternLM-7B and InternLM-20B.
+
+
+## News
+
+[20230920] InternLM-20B is released with base and chat versions.  
+[20230822] InternLM-7B-Chat v1.1 is released with code interpreter and function calling capability. You can try it with [Lagent](https://github.com/InternLM/lagent).
+
+
+## Model Zoo
+
+Our models are released in three platforms: Transformers, ModelScope and OpenXLab.  
+
+| Model                     | Transformers                                                               | ModelScope                                                                                                                          | OpenXLab                                                                                                                                      | Release Date |
+|---------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
+| **InternLM Chat 20B**     | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat)         | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary)         | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b)     | 2023-09-20   |
+| **InternLM 20B** | [🤗internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b) | 2023-09-20 |
+| **InternLM Chat 7B v1.1** | [🤗internlm/internlm-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1) | 2023-08-22   |
+| **InternLM 7B**           | [🤗internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b)                     | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary)                     | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b)           | 2023-07-06   |
+| **InternLM Chat 7B**      | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)           | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary)           | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b)      | 2023-07-06   |
+| **InternLM Chat 7B 8k**   | [🤗internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k)     | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)     | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k)   | 2023-07-06   |
+
+#### Introduction
+InternLM-20B was pre-trained on over **2.3T** Tokens containing high-quality English, Chinese, and code data. Additionally, the Chat version has undergone SFT and RLHF training, enabling it to better and more securely meet users' needs.
+
+In terms of model structure, InternLM-20B opted for a deeper architecture, with a depth set at 60 layers. This surpasses the conventional 7B and 13B models that utilize 32 or 40 layers. When parameters are limited, increasing the number of layers can enhance the model's overall capability. Furthermore, compared to InternLM-7B, the pre-training data used for InternLM-20B underwent higher quality cleansing and was supplemented with data rich in knowledge and designed for reinforcing understanding and reasoning capabilities. As a result, it exhibits significant improvements in understanding, reasoning, mathematical, and programming abilities—all of which test the technical proficiency of language models. Overall, InternLM-20B features the following characteristics:
+- Outstanding overall performance
+- Strong utility invocation capability
+- Supports a 16k context length (Through inference extrapolation)
+- Better value alignment.
+
+#### Performance Evaluation
+
+On the 5 capability dimensions proposed by OpenCompass, InternLM-20B has achieved excellent results (the bolded scores represent the best performances within the 13B-33B parameter range).
+
+| Capability | Llama-13B | Llama2-13B | Baichuan2-13B | InternLM-20B | Llama-33B | Llama-65B | Llama2-70B |
+|----------|-----------|------------|---------------|--------------|-----------|-----------|------------|
+| Language     | 42.5      | 47         | 47.5          | **55**           | 44.6      | 47.1      | 51.6       |
+| Knowledge     | 58.2      | 58.3       | 48.9          | 60.1         | **64**        | 66        | 67.7       |
+| Understanding     | 45.5      | 50.9       | 58.1          | **67.3**         | 50.6      | 54.2      | 60.8       |
+| Reasoning     | 42.7      | 43.6       | 44.2          | **54.9**         | 46.4      | 49.8      | 55         |
+| Examination     | 37.3      | 45.2       | 51.8          | **62.5**         | 47.4      | 49.7      | 57.3       |
+| Overall   | 43.8      | 47.3       | 49.4          | **59.2**         | 48.9      | 51.9      | 57.4       |
+
+The table below compares the performance of mainstream open-source models on some influential and typical datasets.
+
+|      | Benchmarks           | Llama-13B | Llama2-13B | Baichuan2-13B | InternLM-20B | Llama-33B | Llama-65B | Llama2-70B |
+|------|------------------|-----------|------------|---------------|--------------|-----------|-----------|------------|
+| Examination | MMLU             | 47.73     | 54.99      | 59.55         | **62.05**        | 58.73     | 63.71     | 69.75      |
+|      | C-Eval (val)     | 31.83     | 41.4       | **59.01**         | 58.8         | 37.47     | 40.36     | 50.13      |
+|      | AGI-Eval         | 22.03     | 30.93      | 37.37         | **44.58**        | 33.53     | 33.92     | 40.02      |
+| Knowledge | BoolQ            | 78.75     | 82.42      | 67            | **87.46**        | 84.43     | 86.61     | 87.74      |
+|      | TriviaQA         | 52.47     | 59.36      | 46.61         | 57.26        | **66.24**     | 69.79     | 70.71      |
+|      | NaturalQuestions | 20.17     | 24.85      | 16.32         | 25.15        | **30.89**     | 33.41     | 34.16      |
+| Understanding | CMRC             | 9.26      | 31.59      | 29.85         | **68.78**        | 14.17     | 34.73     | 43.74      |
+|      | CSL              | 55        | 58.75      | 63.12         | **65.62**        | 57.5      | 59.38     | 60         |
+|      | RACE (middle)    | 53.41     | 63.02      | 68.94         | **86.35**        | 64.55     | 72.35     | 81.55      |
+|      | RACE (high)      | 47.63     | 58.86      | 67.18         | **83.28**        | 62.61     | 68.01     | 79.93      |
+|      | XSum             | 20.37     | 23.37      | 25.23         | **35.54**        | 20.55     | 19.91     | 25.38      |
+| Reasoning | WinoGrande       | 64.64     | 64.01      | 67.32         | **69.38**        | 66.85     | 69.38     | 69.77      |
+|      | BBH              | 37.93     | 45.62      | 48.98         | **52.51**        | 49.98     | 58.38     | 64.91      |
+|      | GSM8K            | 20.32     | 29.57      | **52.62**         | **52.62**        | 42.3      | 54.44     | 63.31      |
+|      | PIQA             | 79.71     | 79.76      | 78.07         | 80.25        | **81.34**     | 82.15     | 82.54      |
+| Programming | HumanEval        | 14.02     | 18.9       | 17.07         | **25.61**        | 17.68     | 18.9      | 26.22      |
+|      | MBPP             | 20.6      | 26.8       | 30.8          | **35.6**         | 28.4      | 33.6      | 39.6       |
+
+Overall, InternLM-20B comprehensively outperforms open-source models in the 13B parameter range in terms of overall capabilities, and on inference evaluation sets, it approaches or even surpasses the performance of Llama-65B.
+
+- The evaluation results were obtained from [OpenCompass 20230920](https://github.com/internLM/OpenCompass/).
+- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
+
+</details>
+
+
+<details> 
+<summary> InternLM-7B </summary>
+
+#### News
+[20230822] By utilizing richer SFT-type data, the InternLM-7B-Chat v1.1 model supports code interpretation and function invocation. The model structure and code remain unchanged, so the more powerful InternLM-7B-Chat v1.1 can be used in exactly the same way as InternLM-7B-Chat.
+
+#### Introduction
+InternLM-7B contains a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
 
 - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base.
 - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities.
 - It provides a versatile toolset for users to flexibly build their own workflows.
 
-Additionally, a lightweight training framework is offered to support model pre-training without the need for extensive dependencies. With a single codebase, it supports pre-training on large-scale clusters with thousands of GPUs, and fine-tuning on a single GPU while achieving remarkable performance optimizations. InternLM achieves nearly 90% acceleration efficiency during training on 1024 GPUs.
-
-## News
-
-InternLM-7B-Chat v1.1 is released with code interpreter and function calling capability. You can try it with [Lagent](https://github.com/InternLM/lagent).
-
-## InternLM-7B
-
-### Performance Evaluation
+#### Performance Evaluation
 
 We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
 
@@ -72,19 +146,12 @@ We conducted a comprehensive evaluation of InternLM using the open-source evalua
 - The evaluation results were obtained from [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
 - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
 
-### Model Zoo
-
-InternLM 7B and InternLM 7B Chat, trained using InternLM, have been open-sourced. We provide two formats of model weights for use. In addition to loading the models using the Transformers format, you can also load the weights directly using InternLM for further pre-training or human preference alignment training.
-
-| Model                         | InternLM Format Weight Download Link                                                                                                                 | Transformers Format Weight Download Link                                         |
-| ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
-| **InternLM 7B**         | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b)         | [🤗internlm/intern-7b](https://huggingface.co/internlm/internlm-7b)                 |
-| **InternLM Chat 7B v1.1**    | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1)    | [🤗internlm/intern-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1)       |
-| **InternLM Chat 7B**    | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b)    | [🤗internlm/intern-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)       |
-| **InternLM Chat 7B 8k** | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | [🤗internlm/intern-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) |
+</details>
 
 **Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
 
+## Usage Examples
+
 ### Import from Transformers
 
 To load the InternLM 7B Chat model using Transformers, use the following code:
@@ -108,6 +175,23 @@ Sure, here are three tips for effective time management:
 Remember, good time management skills take practice and patience. Start with small steps and gradually incorporate these habits into your daily routine.
 ```
 
+### Import from ModelScope
+
+To load the InternLM model using ModelScope, use the following code:
+
+```python
+from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
+import torch
+model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b-v1_1', revision='v1.0.0')
+tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True,torch_dtype=torch.float16)
+model = AutoModelForCausalLM.from_pretrained(model_dir,device_map="auto",  trust_remote_code=True,torch_dtype=torch.float16)
+model = model.eval()
+response, history = model.chat(tokenizer, "hello", history=[])
+print(response)
+response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
+print(response)
+```
+
 ### Dialogue
 
 You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:
@@ -124,45 +208,27 @@ The effect is as follows
 
 ### Deployment
 
-We use [LMDeploy](https://github.com/InternLM/LMDeploy) to complete the workflow of InternLM deployment.
+We use [LMDeploy](https://github.com/InternLM/LMDeploy) to complete the one-click deployment of InternLM.
 
-```bash
-python3 -m pip install lmdeploy
+1. First, install LMDeploy:
+
+```
+  python3 -m pip install lmdeploy
 ```
 
-You can utilize the following commands to conduct `internlm-chat-7b` FP16 inference, serve it and interact with AI assistant via WebUI:
+2. Use the following command for quick deployment:
 
-```bash
-# convert weight layout
-python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b
-
-# inference lmdeploy's turbomind engine
-python3 -m lmdeploy.turbomind.chat ./workspace
-
-# serving with gradio
-python3 -m lmdeploy.serve.gradio.app ./workspace
+```
+  python3 -m lmdeploy.serve.turbomind.deploy InternLM-7B /path/to/internlm-7b/model hf
 ```
 
-You can also deploy 4-bit quantized `internlm-chat-7b` model via LMDeploy. It greatly trims down the model's memory overhead to 6G, just 40% of what FP16 inference would take. More importantly, with extreme optimized kernel, the inference performance achieves 2.4x faster than FP16 inference on A100-80G.
+3. After exporting the model, you can start a server and have a conversation with the deployed model using the following command:
 
-Try the followings to enjoy 4-bit `internlm-chat-7b` on a Geforce RTX 30x GPU card. You can find the inference benchmark from [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/w4a16.md#inference-performance).
-
-```bash
-# download prequnantized internlm-chat-7b model from huggingface
-git-lfs install
-git clone https://huggingface.co/lmdeploy/llama2-chat-7b-w4
-
-# Convert the model's layout and store it in the default path, ./workspace.
-python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b ./llama2-chat-7b-w4 awq --group-size 128
-
-# inference lmdeploy's turbomind engine
-python3 -m lmdeploy.turbomind.chat ./workspace
-
-# serving with gradio
-python3 -m lmdeploy.serve.gradio.app ./workspace
+```
+  python3 -m lmdeploy.serve.client {server_ip_addresss}:33337
 ```
 
-LMDeploy is an efficient toolkit for compressing, deploying, and serving LLM models. Please refer to the [deployment tutorial](https://github.com/InternLM/LMDeploy) for more details on deploying InternLM.
+[LMDeploy](https://github.com/InternLM/LMDeploy) provides a complete workflow for deploying InternLM. Please refer to the [deployment tutorial](https://github.com/InternLM/LMDeploy) for more details on deploying InternLM.
 
 ## Fine-tuning & Training
 
diff --git a/doc/code-docs/locales/en/LC_MESSAGES/checkpoint.po b/doc/code-docs/locales/en/LC_MESSAGES/checkpoint.po
index bd81fa5..e82a9b1 100644
--- a/doc/code-docs/locales/en/LC_MESSAGES/checkpoint.po
+++ b/doc/code-docs/locales/en/LC_MESSAGES/checkpoint.po
@@ -3,12 +3,11 @@
 # This file is distributed under the same license as the InternLM package.
 # FIRST AUTHOR <EMAIL@ADDRESS>, 2023.
 #
-#, fuzzy
 msgid ""
 msgstr ""
 "Project-Id-Version: InternLM \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2023-09-13 17:07+0800\n"
+"POT-Creation-Date: 2023-09-15 19:06+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: en\n"
@@ -20,7 +19,7 @@ msgstr ""
 "Generated-By: Babel 2.12.1\n"
 
 #: ../../source/checkpoint.rst:2
-msgid "模型保存"
+msgid "模型加载与保存"
 msgstr "Model Checkpointing"
 
 #: ../../source/checkpoint.rst:4
@@ -36,12 +35,86 @@ msgstr ""
 
 #: ../../source/checkpoint.rst:6
 msgid "InternLM支持启动时自动加载最新的模型备份，并在接收信号退出训练时自动进行模型备份。"
-msgstr "InternLM supports automatic loading of latest ckpt at startup and automatic model checkpointing at signal quit. "
+msgstr "InternLM supports automatic loading of latest ckpt at startup and automatic model checkpointing at signal quit."
 
 #: ../../source/checkpoint.rst:9
-msgid "Checkpointing"
+msgid "CheckpointManager"
 msgstr ""
 
+#: ../../source/checkpoint.rst:11
+msgid ""
+"``CheckpointManager`` "
+"是InternLM负责进行模型加载和保存的工具类，其会使用config文件中的ckpt字段的初始化参数字典初始化自身的参数，目前相关的参数有："
+msgstr ""
+"CheckpointManager is the utility class within InternLM responsible for "
+"model loading and saving. It initializes its own parameters using the "
+"initialization parameter dictionary from the 'ckpt' field in the config "
+"file. Currently, the relevant parameters are as follows"
+
+#: ../../source/checkpoint.rst:13
+msgid "``enable_save_ckpt``: 是否开启检查点存储功能（不影响检查点加载）。参数类型 ``bool``，必选参数。"
+msgstr ""
+"``enable_save_ckpt``: Whether to enable checkpoint storage functionality "
+"(does not affect checkpoint loading). Parameter type: `bool`, it is a "
+"required parameter."
+
+#: ../../source/checkpoint.rst:15
+msgid "``save_ckpt_folder``: 检查点存储路径，参数类型 ``str``，默认为： ``None``，在开启检查点存储功能时为必选参数。"
+msgstr ""
+"``save_ckpt_folder``: Checkpoint storage path. Parameter type: ``str``. "
+"This is a required parameter when enabling checkpoint storage "
+"functionality."
+
+#: ../../source/checkpoint.rst:17
+msgid "``checkpoint_every``: 检查点存储频率，参数类型 ``int``，默认为： ``50``。"
+msgstr ""
+"``checkpoint_every``: Checkpoint storage frequency. Parameter type: "
+"``int``."
+
+#: ../../source/checkpoint.rst:19
+msgid ""
+"``load_ckpt_folder``: 初始化检查点/权重加载路径。参数类型 ``str``，默认为： ``None``，详见 :ref"
+":`load-ckpt-folder`。"
+msgstr ""
+"``load_ckpt_folder``: Initialization checkpoint/weight loading path. "
+"Parameter type: ``str``. Default is ``None``. :ref:`load-ckpt-folder`"
+
+#: ../../source/checkpoint.rst:21
+msgid "``async_upload``: 是否开启异步上传，默认值为：``False``，详见 :ref:`asyncupload`。"
+msgstr ""
+"``async_upload``: Whether to enable asynchronous uploading. See "
+"documentation for more details :ref:`asyncupload`"
+
+#: ../../source/checkpoint.rst:23
+msgid "``async_upload_tmp_folder``: 异步上传临时存储路径。"
+msgstr ""
+"``async_upload_tmp_folder``: Temporary storage path for asynchronous "
+"uploading."
+
+#: ../../source/checkpoint.rst:25
+msgid ""
+"``oss_snapshot_freq``: 快照存储频率，默认值为：``checkpoint_every``的一半。详见 "
+":ref:`snapshot`。"
+msgstr ""
+"``oss_snapshot_freq``: Snapshot storage frequency. See documentation for "
+"more details :ref:`snapshot`."
+
+#: ../../source/checkpoint.rst:27
+msgid "``auto_resume``: 是否开启检查点自动恢复，默认值为：``True``，详见 :ref:`autoresume`。"
+msgstr ""
+"``auto_resume``: Whether to enable automatic checkpoint resume. See "
+"documentation for more details :ref:`autoresume`."
+
+#: ../../source/checkpoint.rst:29
+msgid "``stop_file_path`` : 检查点存储控制文件的路径，默认值为：``None``，详见 :ref:`stopfile`。"
+msgstr ""
+"``stop_file_path``: Path to the checkpoint storage control file. See "
+"documentation for more details :ref:`stopfile`."
+
+#: ../../source/checkpoint.rst:32
+msgid "下面给出config文件的参数设置例子："
+msgstr "Here is an example of parameter settings in the config file."
+
 #: internlm.utils.model_checkpoint.CheckpointManager:1 of
 msgid "StorageManagerContext"
 msgstr ""
@@ -86,21 +159,253 @@ msgstr ""
 msgid "Save checkpoint to the given folder path."
 msgstr ""
 
-#~ msgid "Attempt to restore the training state of the last ckpt."
-#~ msgstr ""
+#: ../../source/checkpoint.rst:53
+msgid "加载与存储格式约定"
+msgstr "Model loading and saving path format conventions."
 
-#~ msgid "lr_scheduler object."
-#~ msgstr ""
+#: ../../source/checkpoint.rst:58
+msgid "(1) 路径格式约定"
+msgstr "(1) Path format conventions."
 
-#~ msgid "optimizer object."
-#~ msgstr ""
+#: ../../source/checkpoint.rst:60
+msgid "InternLM对config中出现的所有存储路径都遵循以下的路径格式约定:"
+msgstr ""
+"InternLM follows the following path format conventions for all storage "
+"paths specified in the config:"
 
-#~ msgid "learning rate."
-#~ msgstr ""
+#: ../../source/checkpoint.rst:66
+msgid "对于不同backend的路径，有以下的规则需要注意:"
+msgstr "For paths of different backends, the following rules should be noted:"
 
-#~ msgid "traing states."
-#~ msgstr ""
+#: ../../source/checkpoint.rst:68
+msgid ""
+"如果需要使用boto3的路径，需要在运行前提前导入 ``S3_ACCESS_KEY_ID`` 和 "
+"``S3_SECRET_ACCESS_KEY_ID`` 这两个环境变量。"
+msgstr ""
+"If you need to use paths with Boto3, make sure to import the "
+"``S3_ACCESS_KEY_ID`` and ``S3_SECRET_ACCESS_KEY_ID`` environment "
+"variables before running."
 
-#~ msgid "traning dataloader object"
-#~ msgstr ""
+#: ../../source/checkpoint.rst:70
+msgid "bucket的endpoint一般分为Inside IP和Outside IP，如果可以尽量使用inside IP，会获得更佳的存储速度。"
+msgstr ""
+"The bucket's endpoint is typically divided into Inside IP and Outside IP."
+" Whenever possible, it's advisable to use the Inside IP to achieve better"
+" storage speed."
 
+#: ../../source/checkpoint.rst:75
+msgid "(2) 模型加载(load_ckpt_folder)格式约定"
+msgstr "(2) Model loading format conventions (load_ckpt_folder)."
+
+#: ../../source/checkpoint.rst:77
+msgid "load_ckpt_folder 由三个字段组成， ``path`` 、 ``content`` 和 ``ckpt_type`` 。"
+msgstr ""
+"``load_ckpt_folder`` consists of three fields: ``path``, ``content``, and"
+" ``ckpt_type``."
+
+#: ../../source/checkpoint.rst:79
+msgid "``path``：给出了检查点/初始化模型权重的加载路径（path的格式见下小节）"
+msgstr ""
+"``path``: Specifies the loading path for the checkpoint/initial model "
+"weights (the format of the path is described in the following "
+"subsection)."
+
+#: ../../source/checkpoint.rst:81
+msgid "``content``: 表示需要加载的内容，目前支持的字段包括："
+msgstr ""
+"``content``: Indicates the content to be loaded, currently supported "
+"fields include:"
+
+#: ../../source/checkpoint.rst:83
+msgid "``model``：加载模型权重。"
+msgstr "``model``: Load model weights."
+
+#: ../../source/checkpoint.rst:84
+msgid "``sampler``：加载sampler状态。"
+msgstr "``sampler``: Load sampler state."
+
+#: ../../source/checkpoint.rst:85
+msgid "``scheduler``：加载lr_scheduler状态。"
+msgstr "``scheduler``: Load lr_scheduler state."
+
+#: ../../source/checkpoint.rst:86
+msgid "``optimzier``：加载optimizer状态。"
+msgstr "``optimizer``: Load optimizer state."
+
+#: ../../source/checkpoint.rst:87
+msgid "``all``：表示所有状态均加载，一般在resume训练使用。"
+msgstr ""
+"``all``: Indicates that all states should be loaded, typically used for "
+"resuming training."
+
+#: ../../source/checkpoint.rst:89
+msgid "``ckpt_type``：表示加载的模型权重类型，目前支持的字段包括："
+msgstr ""
+"``ckpt_type``: Represents the type of model weight to be loaded, "
+"currently supported fields include:"
+
+#: ../../source/checkpoint.rst:91
+msgid "``internlm``：internlm约定的checkpoint存储格式。"
+msgstr "``internlm``: Checkpoint storage format as per InternLM conventions."
+
+#: ../../source/checkpoint.rst:93
+msgid "下面给出两个例子："
+msgstr "Here are two examples:"
+
+#: ../../source/checkpoint.rst:107
+msgid "异步上传"
+msgstr "Asynchronous upload."
+
+#: ../../source/checkpoint.rst:109
+msgid ""
+"异步上传会先同步的将模型存储到 ``async_upload_tmp_folder`` "
+"中，再异步的写入远端存储（OSS/NFS）中。从而避免存储ckpt阻塞训练过长时间。"
+msgstr ""
+"Asynchronous upload first synchronously stores the model in the "
+"``async_upload_tmp_folder`` and then asynchronously writes it to remote "
+"storage (OSS/NFS). This helps prevent blocking training for extended "
+"periods while storing checkpoints."
+
+#: ../../source/checkpoint.rst:111 ../../source/checkpoint.rst:129
+#: ../../source/checkpoint.rst:145 ../../source/checkpoint.rst:160
+msgid "config.ckpt 中相关的参数："
+msgstr "The parameters related to ``config.ckpt`` are:"
+
+#: ../../source/checkpoint.rst:113
+msgid "``async_upload``: 是否开启异步上传。参数类型 ``bool/None``，默认为 ``False``。"
+msgstr ""
+"``async_upload``: Whether to enable asynchronous upload. Parameter type: "
+"``bool/None``. Default is ``False``."
+
+#: ../../source/checkpoint.rst:115
+msgid ""
+"``async_upload_tmp_folder``: 异步上传临时存储路径。参数类型 ``str/None``, 默认值为 "
+"``/dev/shm/{JOB_NAME}_tmp_ckpt/``。"
+msgstr ""
+"`async_upload_tmp_folder`: Temporary storage path for asynchronous "
+"upload. Parameter type: `str/None`. Default value is "
+"``/dev/shm/{JOB_NAME}_tmp_ckpt/``."
+
+#: ../../source/checkpoint.rst:117
+msgid "需要注意的是，异步上传功能仅在backend为boto3时才会有效果，bcakend为local时只支持同步存储。"
+msgstr ""
+"It's important to note that asynchronous upload functionality is only "
+"effective when the backend is set to \"boto3.\" When the backend is set "
+"to \"local,\" only synchronous storage is supported."
+
+#: ../../source/checkpoint.rst:119
+msgid ""
+"``async_upload_tmp_folder`` "
+"设置的的原则为尽量设置为计算节点的local目录，这样才可以获得最佳的异步上传速度，一般来说建议为 ``/dev/shm`` 或 "
+"``/nvme`` 下的路径，如果使用同步上传，则该路径可不给。"
+msgstr ""
+"The setting principle is to try to set it to the local directory of the "
+"computing node, so as to obtain the best asynchronous upload speed. "
+"Generally speaking, it is recommended to use the path under ``/dev/shm`` "
+"or ``/nvme``. If If you use synchronous upload, this path does not need "
+"to be given."
+
+#: ../../source/checkpoint.rst:125
+msgid "快照检查点"
+msgstr "Snapshot Checkpoint"
+
+#: ../../source/checkpoint.rst:127
+msgid ""
+"快照检查点是一种特殊的检查点，其是为了减少模型因为训练崩溃（ECC error, NCCL error, "
+".etc）等问题导致训练任务崩溃而损失的训练进度。其采用交替覆盖写的策略，所占用的存储大小为两个step的检查点所需的空间。配合上异步的检查点写入，在不影响训练速度和存储容量的条件下极大的增大了检查点的存储频率。"
+msgstr ""
+"Snapshot checkpoint is a special checkpoint that is used to reduce the "
+"loss of training progress due to training task crashes caused by problems"
+" such as training crashes (ECC error, NCCL error.etc). It adopts an "
+"alternating overwriting strategy, and the storage size occupied is the "
+"space required for the checkpoints of two steps. Coupled with "
+"asynchronous checkpoint writing, it greatly increases the storage "
+"frequency of checkpoints without affecting training speed and storage "
+"capacity."
+
+#: ../../source/checkpoint.rst:131
+msgid "``oss_snapshot_freq``: 快照存储频率。参数类型 ``int/None``，默认为 ``50``。"
+msgstr ""
+"``oss_snapshot_freq``: Snapshot storage frequency. Parameter type "
+"``int/None``, default is ``50``"
+
+#: ../../source/checkpoint.rst:133
+msgid ""
+"``oss_snapshot_freq`` 可以根据模型每step时间酌情设置，一般快照频率在1小时以下，半小时以上为怡/不给（默认值是 "
+"``checkpoint_every`` 的二分之一）。"
+msgstr ""
+"``oss_snapshot_freq`` can be set according to the time of each step of "
+"the model. Generally, the snapshot frequency is less than 1 hour, and it "
+"is Yi/Non for more than half an hour (the default value is one-half of "
+"``checkpoint_every``)"
+
+#: ../../source/checkpoint.rst:139
+msgid "检查点自动恢复"
+msgstr "Checkpoint automatic recovery"
+
+#: ../../source/checkpoint.rst:141
+msgid ""
+"检查点自动加载功能的目的是在resume训练时，自动加载 ``save_ckpt_folder`` "
+"路径下最新的检查点（包括snapshot检查点）。配合上自动重启机制，可以实现无人干预的任务自动恢复。"
+msgstr ""
+"The purpose of Checkpoint automatic recovery is to automatically load the"
+" latest checkpoint (including snapshot checkpoint) under the "
+"``save_ckpt_folder`` path during resume training. Coupled with the "
+"automatic restart mechanism, tasks can be automatically restored without "
+"human intervention."
+
+#: ../../source/checkpoint.rst:143
+msgid ""
+"该功能默认开启，所以要注意如果需要加载 ``load_ckpt_folder`` 路径下的模型权重，要将 ``auto_resume`` 设置为 "
+"False，否则可能会产生预期外的行为。"
+msgstr ""
+"This function is enabled by default, so please note that if you need to "
+"load the model weights under the ``load_ckpt_folder`` path, you must set "
+"``auto_resume`` to ``False``, otherwise unexpected behavior may occur."
+
+#: ../../source/checkpoint.rst:147
+msgid "``auto_resume``: 是否开启检查点自动恢复。参数类型 ``bool``，默认为 ``True``。"
+msgstr ""
+"``auto_resume``: Whether to enable automatic checkpoint recovery. "
+"Parameter type ``bool``, default is ``True``"
+
+#: ../../source/checkpoint.rst:149
+msgid ""
+"``auto_resume`` 如果为True，则尝试从 ``save_ckpt_folder`` "
+"路径中自动加载最新的ckpt，如果找不到，则从step 0开始训练。如果为False，则尝试从 ``load_ckpt_folder`` "
+"中加载模型参数。"
+msgstr ""
+"``auto_resume`` If True, attempts to save_ckpt_folder`Automatically load "
+"the latest ckpt in the path. If not found, training will start from step "
+"0. If False, try to load model parameters from ``load_ckpt_folder``"
+
+#: ../../source/checkpoint.rst:155
+msgid "手动控制检查点存储"
+msgstr "Manual control of checkpoint storage"
+
+#: ../../source/checkpoint.rst:157
+msgid ""
+"在模型距离下一次检查点存储还有很长时间，这时如果希望立刻停止一个任务，又不希望丢失目前训练进度时可以使用手动控制检查点存储功能。通过向一个位于NFS上的"
+" ``stop_file_path`` 文件中写入希望任务停止的step步数，Global Rank "
+"0的进程会在每个step轮询该文件的值，如果发现有我们给出的停止step，则会进行一次广播通知所有的训练进程，约定各进程在训练到该step时存储一个检查点，并选择是否退出。"
+msgstr ""
+"When the model is still a long time away from the next checkpoint "
+"storage, if you want to stop a task immediately and do not want to lose "
+"the current training progress, you can use the manual control checkpoint "
+"storage function. By writing the number of steps you want the task to "
+"stop to a ``stop_file_path`` file located on NFS, the Global Rank 0 "
+"process will poll the value of the file at each step. If it finds that "
+"there is a stop step we gave , a broadcast will be performed to notify "
+"all training processes, and it is agreed that each process will store a "
+"checkpoint when training reaches this step, and choose whether to exit."
+
+#: ../../source/checkpoint.rst:162
+msgid "``stop_file_path``：检查点存储控制文件的路径，参数类型 ``str/None``，默认为 ``None``，表示关闭该功能。"
+msgstr ""
+"``stop_file_path``: The path of the checkpoint storage control file, "
+"parameter type ``str/None``, the default is ``None``, indicating to turn "
+"off this function"
+
+#: ../../source/checkpoint.rst:164
+msgid "下面给出一个写入 ``stop_file_path`` 的例子："
+msgstr "An example of writing to ``stop_file_path`` is given below:"
diff --git a/doc/code-docs/locales/en/LC_MESSAGES/index.po b/doc/code-docs/locales/en/LC_MESSAGES/index.po
index 25645c6..7d0c4ec 100644
--- a/doc/code-docs/locales/en/LC_MESSAGES/index.po
+++ b/doc/code-docs/locales/en/LC_MESSAGES/index.po
@@ -43,39 +43,42 @@ msgstr "Training API"
 msgid "并行训练"
 msgstr "Parallel Training"
 
-#: ../../source/index.rst:51 9234725f3c464731993d73607608c874
+#: ../../source/index.rst:51
+msgid "混合精度"
+msgstr "Mixed Precision"
+
+#: ../../source/index.rst:59 9234725f3c464731993d73607608c874
 msgid "模型备份"
 msgstr "Model Checkpointing"
 
-#: ../../source/index.rst:59 8e4ce037017f4510b2892a66003877fa
+#: ../../source/index.rst:67 8e4ce037017f4510b2892a66003877fa
 msgid "性能分析"
 msgstr "Profiler"
 
-#: ../../source/index.rst:67 a36e02819ecd4b448a8cb4ebbecb6600
+#: ../../source/index.rst:75 a36e02819ecd4b448a8cb4ebbecb6600
 msgid "训练监控"
 msgstr "Monitor"
 
-#: ../../source/index.rst:75 b912e292486f455c8b5cdd75962e8ac2
+#: ../../source/index.rst:83 b912e292486f455c8b5cdd75962e8ac2
 msgid "训练样例"
 msgstr "Example"
 
-#: ../../source/index.rst:83 ea9e9281720941a1830e5df7a2badf7a
+#: ../../source/index.rst:91 ea9e9281720941a1830e5df7a2badf7a
 msgid "常见问题"
 msgstr "Q&A"
 
-#: ../../source/index.rst:91 e08edc5aa1c74965b10084b393b88fae
+#: ../../source/index.rst:99 e08edc5aa1c74965b10084b393b88fae
 msgid "索引和表格"
 msgstr "Indices and tables"
 
-#: ../../source/index.rst:93 f3fdca059caa49dcad09aa44be7f02d6
+#: ../../source/index.rst:101 f3fdca059caa49dcad09aa44be7f02d6
 msgid ":ref:`genindex`"
 msgstr ""
 
-#: ../../source/index.rst:94 b3791e811315435097bb507edc3f4b9b
+#: ../../source/index.rst:102 b3791e811315435097bb507edc3f4b9b
 msgid ":ref:`modindex`"
 msgstr ""
 
-#: ../../source/index.rst:95 a164b772960f4ab8b18c7e8820f69f55
+#: ../../source/index.rst:103 a164b772960f4ab8b18c7e8820f69f55
 msgid ":ref:`search`"
 msgstr ""
-
diff --git a/doc/code-docs/locales/en/LC_MESSAGES/mixed_precision.po b/doc/code-docs/locales/en/LC_MESSAGES/mixed_precision.po
new file mode 100644
index 0000000..2520d1c
--- /dev/null
+++ b/doc/code-docs/locales/en/LC_MESSAGES/mixed_precision.po
@@ -0,0 +1,85 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2023, InternLM Team
+# This file is distributed under the same license as the InternLM package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2023.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: InternLM \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2023-09-26 17:04+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: en\n"
+"Language-Team: en <LL@li.org>\n"
+"Plural-Forms: nplurals=2; plural=(n != 1);\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.12.1\n"
+
+#: ../../source/mixed_precision.rst:2
+msgid "混合精度"
+msgstr "Mixed Precision"
+
+#: ../../source/mixed_precision.rst:3
+msgid ""
+"混合精度是指在模型训练的过程中同时使用16位和32位浮点数类型，是一种在最小化精度损失的前提下加速模型训练的方法。 "
+"混合精度通过让模型的某些部分使用32位浮点数以保持数值稳定性，并在其余部分利用半精度浮点数加速训练并可以减少内存使用，在评估指标（如准确率）方面仍可以获得同等的训练效果。"
+msgstr ""
+"Mixed precision refers to using both 16-bit and 32-bit floating-point "
+"types to train model, which can accelerate the model training while "
+"minimizing the accuracy loss. Mixed precision training uses 32-bit "
+"floating-point types in certain parts of the model to maintain numerical "
+"stability, and accelerate training and reduce memory usage by using "
+"16-bit floating-point types in other parts. Mixed precision can achieve "
+"the same training effect in evaluating indicators such as accuracy."
+
+#: internlm.core.naive_amp.NaiveAMPModel:1 of
+msgid ""
+"This is a wrapper class for a model that automatically casts the model, "
+"its inputs, and outputs into fp16. It also provides options to cast the "
+"output back to fp32 and to synchronize buffers."
+msgstr ""
+
+#: internlm.core.naive_amp.NaiveAMPModel of
+msgid "参数"
+msgstr ""
+
+#: internlm.core.naive_amp.NaiveAMPModel:4 of
+msgid "The model to be wrapped and cast into fp16."
+msgstr ""
+
+#: internlm.core.naive_amp.NaiveAMPModel:6 of
+msgid "If True, the output of this module is cast into fp32. Defaults to True."
+msgstr ""
+
+#: internlm.core.naive_amp.NaiveAMPModel:8 of
+msgid ""
+"The parallel group mode used in this module. Defaults to "
+"``ParallelMode.DATA``."
+msgstr ""
+
+#: internlm.core.naive_amp.NaiveAMPModel:11 of
+msgid "If True, the buffers are synchronized. Defaults to True."
+msgstr ""
+
+#: ../../source/mixed_precision.rst:8
+msgid "InternLM默认将模型转换为16位浮点数类型进行训练（在配置文件中可以设置默认类型为其他数据类型）。在使用混合精度时，需要在构建模型时使用"
+msgstr ""
+"InternLM converts the model to 16-bit floating-point types for model "
+"training by default (the default type can be set to other data types in "
+"the configuration file). When using mixed precision, it is necessary to "
+"use "
+
+#: ../../source/mixed_precision.rst:14
+msgid "将模型的某个子模块设置为32位浮点数类型进行训练，InternLM会在模型训练时自动将数据类型转换成需要的精度。"
+msgstr ""
+"to set a sub-module of the model to 16-bit floating-point types for "
+"training, and InternLM will automatically convert the data type to the "
+"required precision during model training."
+
+#: ../../source/mixed_precision.rst:16
+msgid "例如："
+msgstr "For example:"
diff --git a/doc/code-docs/locales/en/LC_MESSAGES/monitor.po b/doc/code-docs/locales/en/LC_MESSAGES/monitor.po
index 0108368..c9d3045 100644
--- a/doc/code-docs/locales/en/LC_MESSAGES/monitor.po
+++ b/doc/code-docs/locales/en/LC_MESSAGES/monitor.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: InternLM \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2023-09-07 10:56+0800\n"
+"POT-Creation-Date: 2023-09-25 13:44+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: en\n"
@@ -19,180 +19,280 @@ msgstr ""
 "Content-Transfer-Encoding: 8bit\n"
 "Generated-By: Babel 2.12.1\n"
 
-#: ../../source/monitor.rst:2 f95ef3bff8574c77a28ca2f6212cc4b8
+#: ../../source/monitor.rst:2
 msgid "监控和告警"
 msgstr "Monitor and Alert"
 
-#: ../../source/monitor.rst:5 959bd4a6061f4483875c7950ab4546cf
+#: ../../source/monitor.rst:5
 msgid "监控"
 msgstr "Monitoring"
 
-#: ../../source/monitor.rst:7 6071bc878d894865b73380cb887847c1
+#: ../../source/monitor.rst:7
 msgid ""
 "InternLM 使用 ``internlm.monitor.monitor.initialize_monitor_manager()`` "
 "来初始化上下文监控管理。其中，一个实例化的单例对象 ``internlm.monitor.monitor.MonitorManager`` "
 "将管理监控线程并使用 ``internlm.monitor.monitor.MonitorTracker`` 来跟踪模型训练生命周期和训练状态。"
 msgstr ""
-"InternLM uses ``internlm.monitor.monitor.initialize_monitor_manager()`` to initialize context monitor. During this time, "
-"a singleton ``internlm.monitor.monitor.MonitorManager`` will manage monitoring thread and track training status "
-"with ``internlm.monitor.monitor.MonitorTracker``."
+"InternLM uses ``internlm.monitor.monitor.initialize_monitor_manager()`` "
+"to initialize context monitor. During this time, a singleton "
+"``internlm.monitor.monitor.MonitorManager`` will manage monitoring thread"
+" and track training status with "
+"``internlm.monitor.monitor.MonitorTracker``."
 
-#: 9256a063b6dd449786f29e03ce085176
 #: internlm.monitor.monitor.initialize_monitor_manager:1 of
 msgid ""
 "Initialize monitor manager for monitoring training lifetime and alerting "
 "exception info to Feishu."
 msgstr ""
 
-#: 138340fca72a4226be901f7f16c8a590 904b7938fdea46bf81c1ef738aa7bfae
-#: 9ed2a7b4af2243b289e72b2751aec902 aa0dd0dc6bee4a5bb15cc9705f7c13ee
+#: internlm.monitor.alert.initialize_light_monitor
 #: internlm.monitor.alert.send_feishu_msg_with_webhook
+#: internlm.monitor.alert.send_heartbeat
 #: internlm.monitor.monitor.MonitorManager.start_monitor
 #: internlm.monitor.monitor.MonitorTracker
 #: internlm.monitor.monitor.initialize_monitor_manager of
 msgid "参数"
 msgstr ""
 
-#: 3b302339e1d143b6b1d782ff59c9396d 6a06f053828b4c80aef56970750e2085
 #: internlm.monitor.monitor.MonitorManager.start_monitor:3
 #: internlm.monitor.monitor.initialize_monitor_manager:3 of
 msgid "The training job name."
 msgstr ""
 
-#: 3330d06145ee4d35b0b3632e799a35b3 c105473f2f6a4f838a9f0d098762d698
 #: internlm.monitor.monitor.MonitorManager.start_monitor:5
 #: internlm.monitor.monitor.initialize_monitor_manager:5 of
 msgid "The Feishu webhook address for sending alert messages."
 msgstr ""
 
-#: 774c6ff82a2e452295a1a7dcabaded3d internlm.monitor.monitor.MonitorManager:1
-#: of
+#: internlm.monitor.monitor.MonitorManager:1 of
 msgid ""
 "Monitor Manager for managing monitor thread and monitoring training "
 "status."
 msgstr ""
 
-#: 72e696c0ce8f41ea8c7947d35cf322f0
 #: internlm.monitor.monitor.MonitorManager.monitor_loss_spike:1 of
 msgid "Check loss value, if loss spike occurs, send alert message to Feishu."
 msgstr ""
 
-#: 2b668b057fa84e8b92c65bfd49bfb3e9
 #: internlm.monitor.monitor.MonitorManager.monitor_exception:1 of
 msgid "Catch and format exception information, send alert message to Feishu."
 msgstr ""
 
-#: 9852b7143026476d89e1a175223e6d79
 #: internlm.monitor.monitor.MonitorManager.handle_sigterm:1 of
 msgid "Catch SIGTERM signal, and send alert message to Feishu."
 msgstr ""
 
-#: 2e3827bad7b1445fb0d9a7c5a28def5d
 #: internlm.monitor.monitor.MonitorManager.start_monitor:1 of
 msgid ""
 "Initialize and start monitor thread for checking training job status, "
 "loss spike and so on."
 msgstr ""
 
-#: 271cc3e1b0834a7ba6a1ba4d5cce0ef1
 #: internlm.monitor.monitor.MonitorManager.start_monitor:7 of
 msgid "The time of monitor interval in seconds, defaults to 300."
 msgstr ""
 
-#: e4a06091fce8401b83e31ce26c8075a0
 #: internlm.monitor.monitor.MonitorManager.start_monitor:9 of
 msgid ""
 "The limit multiple of current loss to previous loss value, which means "
 "loss spike may be occurs, defaults to 1.5."
 msgstr ""
 
-#: 28bde748477e41f39fa6ca3e1855923d
 #: internlm.monitor.monitor.MonitorManager.stop_monitor:1 of
 msgid "Stop the monitor and alert thread."
 msgstr ""
 
-#: ffb3dda227664748bdb326b6630bc827 internlm.monitor.monitor.MonitorTracker:1
-#: of
+#: internlm.monitor.monitor.MonitorTracker:1 of
 msgid "Track job status and alert to Feishu during job training."
 msgstr ""
 
-#: a1e93683cbb04d8ab825e2776e76efa7 internlm.monitor.monitor.MonitorTracker:3
-#: of
+#: internlm.monitor.monitor.MonitorTracker:3 of
 msgid "The Feishu webhook address for sending alerting messages."
 msgstr ""
 
-#: 7913eeecc0904c128046e80cec1553f2 internlm.monitor.monitor.MonitorTracker:5
-#: of
+#: internlm.monitor.monitor.MonitorTracker:5 of
 msgid "The interval in seconds for monitoring checks. Defaults to 300."
 msgstr ""
 
-#: 8d1abc3067584866983139dd3d85c59c internlm.monitor.monitor.MonitorTracker:7
-#: of
+#: internlm.monitor.monitor.MonitorTracker:7 of
 msgid "The threshold for detecting loss value spikes. Defaults to 1.5."
 msgstr ""
 
-#: a0416fd68700450793daa2167f776618
 #: internlm.monitor.monitor.MonitorTracker.run:1 of
 msgid "start the monitor tracker."
 msgstr ""
 
-#: f55eb990c07b4e8f9388236dd60f0017
 #: internlm.monitor.monitor.MonitorTracker.stop:1 of
 msgid "Stop the monitor tracker."
 msgstr ""
 
-#: ../../source/monitor.rst:18 2202bc091aab417097a1b0268dfe6785
+#: ../../source/monitor.rst:18
 msgid "告警"
 msgstr "Alerting"
 
-#: ../../source/monitor.rst:20 69334f83e644455aa619dde70b8ed1f2
+#: ../../source/monitor.rst:20
 msgid ""
 "InternLM 监控线程会周期性地检查模型训练过程中是否出现 loss spike、潜在的 training stuck、运行时异常等，并捕获 "
 "SIGTERM 异常信号。当出现上述情况时，将触发警报，并通过调用 "
 "``internlm.monitor.alert.send_feishu_msg_with_webhook()`` 向飞书的 Webhook "
 "地址发送报警消息。"
 msgstr ""
-"InternLM monitor thread periodically tracks loss spike, potential stuck condition, runtime exception, and SIGTERM signal. "
-"When above situation occurs, an alert will be triggered and a message will be sent to the Feishu webhook address by calling "
+"InternLM monitor thread periodically tracks loss spike, potential stuck "
+"condition, runtime exception, and SIGTERM signal. When above situation "
+"occurs, an alert will be triggered and a message will be sent to the "
+"Feishu webhook address by calling "
 "``internlm.monitor.alert.send_feishu_msg_with_webhook()``."
 
-#: 15980526c2fa4ed8befa1604f271a3f1
 #: internlm.monitor.alert.send_feishu_msg_with_webhook:1 of
 msgid "Use Feishu robot to send messages with the given webhook."
 msgstr ""
 
-#: 38e5738c2b914c8096e1a0f345e6c0b4
 #: internlm.monitor.alert.send_feishu_msg_with_webhook:3 of
 msgid "The webhook to be used to send message."
 msgstr ""
 
-#: 4984f1a3bb0d46b48b2aad4fba8b43d9
 #: internlm.monitor.alert.send_feishu_msg_with_webhook:5 of
 msgid "The message title."
 msgstr ""
 
-#: a9822a4cf30d4947b12f70a0efe62a5e
 #: internlm.monitor.alert.send_feishu_msg_with_webhook:7 of
 msgid "The message body."
 msgstr ""
 
-#: 57d9ab65fe9f45c28351839fecf2f31e
 #: internlm.monitor.alert.send_feishu_msg_with_webhook of
 msgid "返回"
 msgstr ""
 
-#: 2b6ac97fd152498183a8624a9087812b
 #: internlm.monitor.alert.send_feishu_msg_with_webhook:10 of
 msgid "The response from the request. Or catch the exception and return None."
 msgstr ""
 
-#: ec45dedf976046eb909f5b7f79a7d44c
+#: internlm.monitor.alert.initialize_light_monitor
 #: internlm.monitor.alert.send_feishu_msg_with_webhook of
 msgid "抛出"
 msgstr ""
 
-#: 4c6aeec19a6041cfbfa577b1c5a85ac1
 #: internlm.monitor.alert.send_feishu_msg_with_webhook:12 of
 msgid "An exception rasied by the HTTP post request."
 msgstr ""
 
+#: ../../source/monitor.rst:25
+msgid "轻量监控"
+msgstr "Light Monitoring"
+
+#: ../../source/monitor.rst:27
+msgid ""
+"InternLM轻量级监控工具采用心跳机制实时监测训练过程中的各项指标，如loss、grad_norm、训练阶段的耗时等。同时，InternLM还可以通过"
+" `grafana dashboard <https://grafana.com/grafana/dashboards/>`_ "
+"直观地呈现这些指标信息，以便用户进行更加全面和深入的训练分析。"
+msgstr ""
+"The InternLM light monitoring tool employs a heartbeat mechanism to real-"
+"time monitor various metrics during the training process, such as loss, "
+"grad_norm, and training phase duration. Additionally, InternLM can "
+"present these metric details through a `grafana dashboard "
+"<https://grafana.com/grafana/dashboards/>`_, allowing users to conduct "
+"more comprehensive and in-depth training analysis in an intuitive manner."
+
+#: ../../source/monitor.rst:29
+msgid ""
+"轻量监控的配置由配置文件中的 ``monitor`` 字段指定， 用户可以通过修改配置文件 `config file "
+"<https://github.com/InternLM/InternLM/blob/main/configs/7B_sft.py>`_ "
+"来更改监控配置。以下是一个监控配置的示例："
+msgstr ""
+"The configuration for light monitoring is specified by the ``monitor`` "
+"field in the configuration file. Users can modify monitoring settings by "
+"editing the configuration file `config file "
+"<https://github.com/InternLM/InternLM/blob/main/configs/7B_sft.py>`_. "
+"Here is an example of a monitoring configuration:"
+
+#: ../../source/monitor.rst:41
+msgid "enable_feishu_alert (bool)：是否启用飞书告警。默认值：False。"
+msgstr "enable_feishu_alert: Whether to enable Feishu alerts. Defaults: False."
+
+#: ../../source/monitor.rst:42
+msgid "feishu_alert_address (str)：飞书告警的 Webhook 地址。默认值：None。"
+msgstr "feishu_alert_address: The webhook address for Feishu alerts. Defaults: None."
+
+#: ../../source/monitor.rst:43
+msgid "light_monitor_address (str)：轻量监控的地址。默认值：None。"
+msgstr "light_monitor_address: The address for lightweight monitoring. Defaults: None."
+
+#: ../../source/monitor.rst:45
+msgid ""
+"InternLM 使用 ``internlm.monitor.alert.initialize_light_monitor`` "
+"来初始化轻量监控客户端。一旦初始化完成，它会建立与监控服务器的连接。在训练过程中，使用 "
+"``internlm.monitor.alert.send_heartbeat`` "
+"来发送不同类型的心跳信息至监控服务器。监控服务器会根据这些心跳信息来检测训练是否出现异常，并在需要时发送警报消息。"
+msgstr ""
+"InternLM uses ``internlm.monitor.alert.initialize_light_monitor`` to "
+"initialize the lightweight monitoring client. Once initialization is "
+"complete, it establishes a connection with the monitoring server. During "
+"the training process, it uses ``internlm.monitor.alert.send_heartbeat`` "
+"to send various types of heartbeat messages to the monitoring server. The"
+" monitoring server uses these heartbeat messages to detect if the "
+"training encounters any abnormalities and sends alert messages as needed."
+
+#: internlm.monitor.alert.initialize_light_monitor:1 of
+msgid "Initialize the lightweight monitoring module."
+msgstr ""
+
+#: internlm.monitor.alert.initialize_light_monitor:3 of
+msgid "The address of the monitor. Defaults to 'MONITOR_SERVER' environment."
+msgstr ""
+
+#: internlm.monitor.alert.initialize_light_monitor:6 of
+msgid ""
+"If any exceptions occur during initialization, they will be caught and "
+"logged as warnings."
+msgstr ""
+
+#: internlm.monitor.alert.initialize_light_monitor:9
+#: internlm.monitor.alert.send_heartbeat:9 of
+msgid "示例"
+msgstr "Example"
+
+#: internlm.monitor.alert.initialize_light_monitor:10 of
+msgid ""
+"Initialize the monitoring module with the default address "
+"``initialize_light_monitor()``"
+msgstr ""
+
+#: internlm.monitor.alert.send_heartbeat:1 of
+msgid "Send a heartbeat message to a monitoring server."
+msgstr ""
+
+#: internlm.monitor.alert.send_heartbeat:3 of
+msgid ""
+"The type of heartbeat message, e.g., \"train_metrics\", \"init_time\", "
+"\"stage_time\"."
+msgstr ""
+
+#: internlm.monitor.alert.send_heartbeat:5 of
+msgid "A dictionary containing message data to be included in the heartbeat."
+msgstr ""
+
+#: internlm.monitor.alert.send_heartbeat:10 of
+#, fuzzy
+msgid ""
+"Sending a heartbeat message for training metrics "
+"``send_heartbeat(\"train_metrics\", {\"loss\": 0.1, \"accuracy\": "
+"0.95})``"
+msgstr ""
+
+#: internlm.monitor.alert.send_heartbeat:13 of
+msgid ""
+"Sending a heartbeat message for initialization time "
+"``send_heartbeat(\"init_time\", {\"import_time\": 0.25})``"
+msgstr ""
+
+#: internlm.monitor.alert.send_heartbeat:16 of
+msgid ""
+"Sending a heartbeat message for stage time "
+"``send_heartbeat(\"stage_time\", {\"fwd_time\": 2.3, \"bwd_time\": "
+"6.2})``"
+msgstr ""
+
+#~ msgid ""
+#~ "InternLM轻量监控基于心跳机制来监控训练过程中是否出现 "
+#~ "loss、grad_norm异常、训练各阶段时间超时等异常，并通过dashboard展示训练指标信息等。"
+#~ msgstr ""
diff --git a/doc/code-docs/source/checkpoint.rst b/doc/code-docs/source/checkpoint.rst
index ee4f037..cd9b755 100644
--- a/doc/code-docs/source/checkpoint.rst
+++ b/doc/code-docs/source/checkpoint.rst
@@ -1,12 +1,172 @@
-模型保存
+模型加载与保存
 ===================
 
 InternLM 使用 ``internlm.utils.model_checkpoint.CheckpointManager`` 来管理模型保存。其中，可以使用 ``CheckpointManager.try_save_checkpoint(train_state)`` 来保存指定 step 的模型状态。
 
 InternLM支持启动时自动加载最新的模型备份，并在接收信号退出训练时自动进行模型备份。
 
-Checkpointing
--------------
+CheckpointManager
+--------------------------
+
+``CheckpointManager`` 是InternLM负责进行模型加载和保存的工具类，其会使用config文件中的ckpt字段的初始化参数字典初始化自身的参数，目前相关的参数有：
+
+- ``enable_save_ckpt``: 是否开启检查点存储功能（不影响检查点加载）。参数类型 ``bool``，必选参数。
+
+- ``save_ckpt_folder``: 检查点存储路径，参数类型 ``str``，默认为： ``None``，在开启检查点存储功能时为必选参数。
+
+- ``checkpoint_every``: 检查点存储频率，参数类型 ``int``，默认为： ``50``。
+
+- ``load_ckpt_folder``: 初始化检查点/权重加载路径。参数类型 ``str``，默认为： ``None``，详见 :ref:`load-ckpt-folder`。
+
+- ``async_upload``: 是否开启异步上传，默认值为：``False``，详见 :ref:`asyncupload`。
+
+- ``async_upload_tmp_folder``: 异步上传临时存储路径。
+
+- ``oss_snapshot_freq``: 快照存储频率，默认值为：``checkpoint_every``的一半。详见 :ref:`snapshot`。
+
+- ``auto_resume``: 是否开启检查点自动恢复，默认值为：``True``，详见 :ref:`autoresume`。
+
+- ``stop_file_path`` : 检查点存储控制文件的路径，默认值为：``None``，详见 :ref:`stopfile`。
+
+
+下面给出config文件的参数设置例子：
+
+.. code-block:: python
+
+  ckpt = dict(
+      enable_save_ckpt=False,  # enable ckpt save.
+      save_ckpt_folder=SAVE_CKPT_FOLDER,  # Path to save training ckpt.
+      load_ckpt_folder=dict(path="local:/mnt/mfs/ckpt", content=["all",], ckpt_type="internlm"), 
+      auto_resume=False, # disable auto-resume, internlm will load model checkpoint from the path of 'load_ckpt_folder'.
+      checkpoint_every=CHECKPOINT_EVERY,
+      async_upload=True,  # async ckpt upload. (only work for boto3 ckpt)
+      async_upload_tmp_folder="/dev/shm/internlm_tmp_ckpt/",  # path for temporarily files during asynchronous upload.
+      oss_snapshot_freq=int(CHECKPOINT_EVERY / 2),  # snapshot ckpt save frequency.
+  )
+
 
 .. autoclass:: internlm.utils.model_checkpoint.CheckpointManager
     :members:
+
+
+加载与存储格式约定
+--------------------------
+
+.. _load-ckpt-folder:
+
+(1) 路径格式约定
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+InternLM对config中出现的所有存储路径都遵循以下的路径格式约定:
+
+.. figure:: ../../imgs/ckpt_path_format_CN.png
+  :scale: 30%
+  :class: with-border
+
+对于不同backend的路径，有以下的规则需要注意:
+
+1. 如果需要使用boto3的路径，需要在运行前提前导入 ``S3_ACCESS_KEY_ID`` 和 ``S3_SECRET_ACCESS_KEY_ID`` 这两个环境变量。
+
+2. bucket的endpoint一般分为Inside IP和Outside IP，如果可以尽量使用inside IP，会获得更佳的存储速度。
+
+
+
+(2) 模型加载(load_ckpt_folder)格式约定
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+load_ckpt_folder 由三个字段组成， ``path`` 、 ``content`` 和 ``ckpt_type`` 。
+
+- ``path``：给出了检查点/初始化模型权重的加载路径（path的格式见下小节）
+
+- ``content``: 表示需要加载的内容，目前支持的字段包括：
+
+  - ``model``：加载模型权重。
+  - ``sampler``：加载sampler状态。
+  - ``scheduler``：加载lr_scheduler状态。
+  - ``optimzier``：加载optimizer状态。
+  - ``all``：表示所有状态均加载，一般在resume训练使用。
+
+- ``ckpt_type``：表示加载的模型权重类型，目前支持的字段包括：
+
+  - ``internlm``：internlm约定的checkpoint存储格式。
+
+下面给出两个例子：
+
+.. code-block:: python
+
+  # 从文件存储相对路径 ckpt_model 中加载已有模型权重初始化模型，适合 sft 等训练初始化
+  load_ckpt_folder= dict(path="local:ckpt_model", content=["model",], ckpt_type="internlm")
+
+  # 从文件存储相对路径 ckpt_model 中加载所有的状态，适合断点续训的场景
+  load_ckpt_folder= dict(path="local:ckpt_model", content=["all",], ckpt_type="internlm")
+
+
+.. _asyncupload:
+
+异步上传
+--------------------------
+
+异步上传会先同步的将模型存储到 ``async_upload_tmp_folder`` 中，再异步的写入远端存储（OSS/NFS）中。从而避免存储ckpt阻塞训练过长时间。
+
+config.ckpt 中相关的参数：
+
+- ``async_upload``: 是否开启异步上传。参数类型 ``bool/None``，默认为 ``False``。
+
+- ``async_upload_tmp_folder``: 异步上传临时存储路径。参数类型 ``str/None``, 默认值为 ``/dev/shm/{JOB_NAME}_tmp_ckpt/``。
+
+需要注意的是，异步上传功能仅在backend为boto3时才会有效果，bcakend为local时只支持同步存储。
+
+``async_upload_tmp_folder`` 设置的的原则为尽量设置为计算节点的local目录，这样才可以获得最佳的异步上传速度，一般来说建议为 ``/dev/shm`` 或 ``/nvme`` 下的路径，如果使用同步上传，则该路径可不给。
+
+
+.. _snapshot:
+
+快照检查点
+--------------------------
+
+快照检查点是一种特殊的检查点，其是为了减少模型因为训练崩溃（ECC error, NCCL error, .etc）等问题导致训练任务崩溃而损失的训练进度。其采用交替覆盖写的策略，所占用的存储大小为两个step的检查点所需的空间。配合上异步的检查点写入，在不影响训练速度和存储容量的条件下极大的增大了检查点的存储频率。
+
+config.ckpt 中相关的参数：
+
+- ``oss_snapshot_freq``: 快照存储频率。参数类型 ``int/None``，默认为 ``50``。
+
+``oss_snapshot_freq`` 可以根据模型每step时间酌情设置，一般快照频率在1小时以下，半小时以上为怡/不给（默认值是 ``checkpoint_every`` 的二分之一）。
+
+
+.. _autoresume:
+
+检查点自动恢复
+--------------------------
+
+检查点自动加载功能的目的是在resume训练时，自动加载 ``save_ckpt_folder`` 路径下最新的检查点（包括snapshot检查点）。配合上自动重启机制，可以实现无人干预的任务自动恢复。
+
+该功能默认开启，所以要注意如果需要加载 ``load_ckpt_folder`` 路径下的模型权重，要将 ``auto_resume`` 设置为 False，否则可能会产生预期外的行为。
+
+config.ckpt 中相关的参数：
+
+- ``auto_resume``: 是否开启检查点自动恢复。参数类型 ``bool``，默认为 ``True``。
+
+``auto_resume`` 如果为True，则尝试从 ``save_ckpt_folder`` 路径中自动加载最新的ckpt，如果找不到，则从step 0开始训练。如果为False，则尝试从 ``load_ckpt_folder`` 中加载模型参数。
+
+
+.. _stopfile:
+
+手动控制检查点存储
+--------------------------
+
+在模型距离下一次检查点存储还有很长时间，这时如果希望立刻停止一个任务，又不希望丢失目前训练进度时可以使用手动控制检查点存储功能。通过向一个位于NFS上的 ``stop_file_path`` 文件中写入希望任务停止的step步数，Global Rank 0的进程会在每个step轮询该文件的值，如果发现有我们给出的停止step，则会进行一次广播通知所有的训练进程，约定各进程在训练到该step时存储一个检查点，并选择是否退出。
+
+
+config.ckpt 中相关的参数：
+
+- ``stop_file_path``：检查点存储控制文件的路径，参数类型 ``str/None``，默认为 ``None``，表示关闭该功能。
+
+下面给出一个写入 ``stop_file_path`` 的例子：
+
+.. code-block:: bash
+
+  # 我们希望停止的step步数
+  # 如果存入的step>0，则任务会在存储ckpt后自动退出
+  # 如果存入的step<0，则任务会在存储ckpt后会继续训练
+  echo "999" > ./llm_alter/1006_pr.log
+
diff --git a/doc/code-docs/source/index.rst b/doc/code-docs/source/index.rst
index c01ac54..8811af2 100644
--- a/doc/code-docs/source/index.rst
+++ b/doc/code-docs/source/index.rst
@@ -47,6 +47,14 @@ InternLM
 
    parallel
 
+混合精度
+-------------------
+
+.. toctree::
+   :maxdepth: 2
+
+   mixed_precision
+
 模型备份
 --------------------
 
diff --git a/doc/code-docs/source/mixed_precision.rst b/doc/code-docs/source/mixed_precision.rst
new file mode 100644
index 0000000..59955e0
--- /dev/null
+++ b/doc/code-docs/source/mixed_precision.rst
@@ -0,0 +1,36 @@
+混合精度
+-----------------
+混合精度是指在模型训练的过程中同时使用16位和32位浮点数类型，是一种在最小化精度损失的前提下加速模型训练的方法。
+混合精度通过让模型的某些部分使用32位浮点数以保持数值稳定性，并在其余部分利用半精度浮点数加速训练并可以减少内存使用，在评估指标（如准确率）方面仍可以获得同等的训练效果。
+
+.. autoclass:: internlm.core.naive_amp.NaiveAMPModel
+
+InternLM默认将模型转换为16位浮点数类型进行训练（在配置文件中可以设置默认类型为其他数据类型）。在使用混合精度时，需要在构建模型时使用
+
+.. code-block:: python
+
+    set_fp32_attr_to_module(/*fp32 module*/)
+
+将模型的某个子模块设置为32位浮点数类型进行训练，InternLM会在模型训练时自动将数据类型转换成需要的精度。
+
+例如：
+
+.. code-block:: python
+
+    class MlpModel(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.linear1 = nn.Linear(4, 1, bias=False)
+            self.linear2 = nn.Linear(1, 4, bias=False)
+
+    model = MlpModel()
+    # set model.linear2 as fp32 module
+    set_fp32_attr_to_module(model.linear2)
+
+    # apply mixed precision
+    model = NaiveAMPModel(
+        model=model,
+        output_to_fp32=True,
+        dtype=torch.bfloat16(),
+        sync_buffer=False,
+    )
diff --git a/doc/code-docs/source/monitor.rst b/doc/code-docs/source/monitor.rst
index de150fd..b3c684c 100644
--- a/doc/code-docs/source/monitor.rst
+++ b/doc/code-docs/source/monitor.rst
@@ -20,3 +20,30 @@ InternLM 使用 ``internlm.monitor.monitor.initialize_monitor_manager()`` 来初
 InternLM 监控线程会周期性地检查模型训练过程中是否出现 loss spike、潜在的 training stuck、运行时异常等，并捕获 SIGTERM 异常信号。当出现上述情况时，将触发警报，并通过调用 ``internlm.monitor.alert.send_feishu_msg_with_webhook()`` 向飞书的 Webhook 地址发送报警消息。
 
 .. autofunction:: internlm.monitor.alert.send_feishu_msg_with_webhook
+
+轻量监控
+-----------------
+
+InternLM轻量级监控工具采用心跳机制实时监测训练过程中的各项指标，如loss、grad_norm、训练阶段的耗时等。同时，InternLM还可以通过 `grafana dashboard <https://grafana.com/grafana/dashboards/>`_ 直观地呈现这些指标信息，以便用户进行更加全面和深入的训练分析。
+
+轻量监控的配置由配置文件中的 ``monitor`` 字段指定， 用户可以通过修改配置文件 `config file <https://github.com/InternLM/InternLM/blob/main/configs/7B_sft.py>`_ 来更改监控配置。以下是一个监控配置的示例：
+
+.. code-block:: python
+
+    monitor = dict(
+        alert=dict(
+            enable_feishu_alert=False,
+            feishu_alert_address=None,
+            light_monitor_address=None,
+        ),
+    )
+
+- enable_feishu_alert (bool)：是否启用飞书告警。默认值：False。
+- feishu_alert_address (str)：飞书告警的 Webhook 地址。默认值：None。
+- light_monitor_address (str)：轻量监控的地址。默认值：None。
+
+InternLM 使用 ``internlm.monitor.alert.initialize_light_monitor`` 来初始化轻量监控客户端。一旦初始化完成，它会建立与监控服务器的连接。在训练过程中，使用 ``internlm.monitor.alert.send_heartbeat`` 来发送不同类型的心跳信息至监控服务器。监控服务器会根据这些心跳信息来检测训练是否出现异常，并在需要时发送警报消息。
+
+.. autofunction:: internlm.monitor.alert.initialize_light_monitor
+
+.. autofunction:: internlm.monitor.alert.send_heartbeat
diff --git a/doc/code-docs/source/parallel.rst b/doc/code-docs/source/parallel.rst
index 5f593c0..6de9545 100644
--- a/doc/code-docs/source/parallel.rst
+++ b/doc/code-docs/source/parallel.rst
@@ -133,7 +133,7 @@ ZeRO1.5 的实现使用了分层分片的概念，通过配置值 ``parallel.zer
 
     hybrid_zero_optimizer = dict(
         # Enable low_level_optimzer overlap_communication
-        overlap_sync_grad=True,  
+        overlap_sync_grad=True,
         overlap_sync_param=True,
         # bucket size for nccl communication params
         reduce_bucket_size=512 * 1024 * 1024,
diff --git a/doc/en/usage.md b/doc/en/usage.md
index 864ead6..cab08ca 100644
--- a/doc/en/usage.md
+++ b/doc/en/usage.md
@@ -385,3 +385,36 @@ Taking the configuration of the demo training on a single machine with 8 GPUs on
 2023-07-07 12:29:13,147	INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.65918563194305,step=4,loss=10.149517059326172,tgs (tokens/gpu/second)=4270.52,lr=1.2000000000000002e-06,loss_scale=65536.0,grad_norm=51.582841631508145,micro_num=4,num_consumed_tokens=655360,inf_nan_skip_batches=0,num_samples_in_batch=19,largest_length=2048,largest_batch=6,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.68
 2023-07-07 12:29:16,994	INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.3109313713174,step=5,loss=9.822169303894043,tgs (tokens/gpu/second)=4262.67,lr=1.4000000000000001e-06,loss_scale=65536.0,grad_norm=47.10386835560855,micro_num=4,num_consumed_tokens=786432,inf_nan_skip_batches=0,num_samples_in_batch=17,largest_length=2048,largest_batch=6,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.69
 ```
+
+### Long Text Generation
+
+During the inference phase, you can turn on the Dynamic NTK option of RoPE by setting `use_dynamic_ntk_rope=True` in the model configuration, so that the model can adapt to long text input and output and achieve an extrapolation effect of 16K:
+```python #21
+model_type = "INTERNLM"  # 模型类型，默认值为 "INTERNLM"，对应模型结构初始化接口函数
+NUM_ATTENTION_HEAD = 32
+VOCAB_SIZE = 103168
+HIDDEN_SIZE = 4096
+NUM_LAYER = 32
+MLP_RATIO = 8 / 3
+model = dict(
+    checkpoint=False,   # 进行重计算的模型层数比例，可选值为 True/False/[0-1]
+    num_attention_heads=NUM_ATTENTION_HEAD,
+    embed_split_hidden=True,
+    vocab_size=VOCAB_SIZE,
+    embed_grad_scale=1,
+    parallel_output=True,
+    hidden_size=HIDDEN_SIZE,
+    num_layers=NUM_LAYER,
+    mlp_ratio=MLP_RATIO,
+    apply_post_layer_norm=False,
+    dtype="torch.bfloat16",
+    norm_type="rmsnorm",
+    layer_norm_epsilon=1e-5,
+    use_dynamic_ntk_rope=True
+)
+```
+
+Regarding the principle of Dyanmic NTK, please refer to
+
+1. https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases
+2. https://kexue.fm/archives/9675
diff --git a/doc/imgs/ckpt_path_format_CN.png b/doc/imgs/ckpt_path_format_CN.png
new file mode 100644
index 0000000..0307d22
Binary files /dev/null and b/doc/imgs/ckpt_path_format_CN.png differ
diff --git a/doc/imgs/modelscope_logo.png b/doc/imgs/modelscope_logo.png
new file mode 100644
index 0000000..0286d28
Binary files /dev/null and b/doc/imgs/modelscope_logo.png differ
diff --git a/doc/usage.md b/doc/usage.md
index 82c20e0..347ca35 100644
--- a/doc/usage.md
+++ b/doc/usage.md
@@ -368,3 +368,36 @@ $ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py -
 2023-07-07 12:29:13,147	INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.65918563194305,step=4,loss=10.149517059326172,tgs (tokens/gpu/second)=4270.52,lr=1.2000000000000002e-06,loss_scale=65536.0,grad_norm=51.582841631508145,micro_num=4,num_consumed_tokens=655360,inf_nan_skip_batches=0,num_samples_in_batch=19,largest_length=2048,largest_batch=6,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.68
 2023-07-07 12:29:16,994	INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.3109313713174,step=5,loss=9.822169303894043,tgs (tokens/gpu/second)=4262.67,lr=1.4000000000000001e-06,loss_scale=65536.0,grad_norm=47.10386835560855,micro_num=4,num_consumed_tokens=786432,inf_nan_skip_batches=0,num_samples_in_batch=17,largest_length=2048,largest_batch=6,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.69
 ```
+
+### 长文本生成
+
+在推理阶段，您可以在模型配置中通过设置 `use_dynamic_ntk_rope=True` 开启 RoPE 的 Dynamic NTK 选项，从而使得模型适应长文本输入输出，达到 16K 的外推效果:
+```python #21
+model_type = "INTERNLM"  # 模型类型，默认值为 "INTERNLM"，对应模型结构初始化接口函数
+NUM_ATTENTION_HEAD = 32
+VOCAB_SIZE = 103168
+HIDDEN_SIZE = 4096
+NUM_LAYER = 32
+MLP_RATIO = 8 / 3
+model = dict(
+    checkpoint=False,   # 进行重计算的模型层数比例，可选值为 True/False/[0-1]
+    num_attention_heads=NUM_ATTENTION_HEAD,
+    embed_split_hidden=True,
+    vocab_size=VOCAB_SIZE,
+    embed_grad_scale=1,
+    parallel_output=True,
+    hidden_size=HIDDEN_SIZE,
+    num_layers=NUM_LAYER,
+    mlp_ratio=MLP_RATIO,
+    apply_post_layer_norm=False,
+    dtype="torch.bfloat16",
+    norm_type="rmsnorm",
+    layer_norm_epsilon=1e-5,
+    use_dynamic_ntk_rope=True
+)
+```
+
+关于 Dyanmic NTK 的原理，详细请参考
+
+1. https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases
+2. https://kexue.fm/archives/9675
diff --git a/experiment/Dockerfile-centos b/experiment/Dockerfile-centos
index 31ffc19..71c63e4 100644
--- a/experiment/Dockerfile-centos
+++ b/experiment/Dockerfile-centos
@@ -133,6 +133,7 @@ RUN /opt/conda/bin/pip --no-cache-dir install \
     botocore \
     torch-scatter \
     pyecharts \
+    py-libnuma \
     -f https://data.pyg.org/whl/torch-${PYTORCH_VERSION}+cu117.html \
     && /opt/conda/bin/pip --no-cache-dir install \
     --extra-index-url https://download.pytorch.org/whl/cu117 \
diff --git a/experiment/Dockerfile-ubuntu b/experiment/Dockerfile-ubuntu
index 230a3b5..0675c2c 100644
--- a/experiment/Dockerfile-ubuntu
+++ b/experiment/Dockerfile-ubuntu
@@ -114,6 +114,7 @@ RUN /opt/conda/bin/pip --no-cache-dir install \
     botocore \
     torch-scatter \
     pyecharts \
+    py-libnuma \
     -f https://data.pyg.org/whl/torch-${PYTORCH_VERSION}+cu117.html \
     && /opt/conda/bin/pip --no-cache-dir install \
     --extra-index-url https://download.pytorch.org/whl/cu117 \
diff --git a/internlm/core/naive_amp.py b/internlm/core/naive_amp.py
index 7470659..b0741e4 100644
--- a/internlm/core/naive_amp.py
+++ b/internlm/core/naive_amp.py
@@ -3,7 +3,8 @@
 
 # adopted from https://github.com/hpcaitech/ColossalAI/tree/main/colossalai/amp
 
-from typing import Any
+from functools import partial
+from typing import Any, Union
 
 import torch
 import torch.distributed as dist
@@ -15,6 +16,14 @@ from internlm.core.context import ParallelMode
 from internlm.core.context.parallel_context import global_context as gpc
 
 
+def set_fp32_attr_to_module(module: nn.Module):
+    setattr(module, "is_fp32_module", True)
+
+
+def module_has_fp32_attr(module: nn.Module):
+    return hasattr(module, "is_fp32_module") and getattr(module, "is_fp32_module")
+
+
 class NaiveAMPModel(nn.Module):
     """
     This is a wrapper class for a model that automatically casts the model, its inputs, and outputs into fp16.
@@ -51,6 +60,9 @@ class NaiveAMPModel(nn.Module):
             self._sync_buf = False
         self._first_eval_run = False
 
+        # register hook for fp32 module
+        self._register_fp32_parameters_hook()
+
     @property
     def sync_buffer(self):
         """Returns the current state of the buffer synchronization."""
@@ -134,3 +146,46 @@ class NaiveAMPModel(nn.Module):
         if self._output_to_fp32:
             out = self.convert_to_fp32(out)
         return out
+
+    def _register_fp32_parameters_hook(self) -> None:
+        """
+        Set module to fp32 and register automatic conversion hook in the forward pass.
+        The fp32 modules are marked by set_fp32_attr_to_module(.)
+        """
+        dtype = torch.float32
+
+        def to_dtype(x, dtype=dtype):
+            if isinstance(x, Tensor) and x.dtype != dtype:
+                return x.to(dtype)
+            return x
+
+        def _pre_forward_hook_for_fp32(model: nn.Module, inputs: tuple):  # pylint: disable=W0613
+            assert isinstance(inputs, tuple)
+            return tuple(map(to_dtype, inputs))
+
+        def _post_forward_hook_for_fp32(
+            model: nn.Module, inputs: tuple, outputs: Union[tuple, Tensor]
+        ):  # pylint: disable=W0613
+            assert isinstance(inputs, Union[tuple, Tensor])
+            if isinstance(outputs, tuple):
+                return tuple(map(to_dtype, outputs, [self.dtype] * len(outputs)))
+            else:
+                return to_dtype(outputs, self.dtype)
+
+        # just want to share same for loop for ModuleList and Module
+        if isinstance(self.model, nn.ModuleList):
+            model = self.model
+        else:
+            model = [self.model]
+
+        modules = []
+        # record the modules to transformer/embeding/head/norm block
+        for _chunk in model:
+            modules.extend([sub_module for _, sub_module in _chunk.named_modules()])
+
+        # register_forward_pre_hook for transformer/embeding/norm/xxx block
+        for sub_module in modules:
+            if module_has_fp32_attr(sub_module):
+                sub_module.to(dtype)
+                sub_module.register_forward_pre_hook(partial(_pre_forward_hook_for_fp32))
+                sub_module.register_forward_hook(partial(_post_forward_hook_for_fp32))
diff --git a/internlm/initialize/__init__.py b/internlm/initialize/__init__.py
index ae94e0a..14fe06b 100644
--- a/internlm/initialize/__init__.py
+++ b/internlm/initialize/__init__.py
@@ -4,6 +4,7 @@ from .launch import (
     initialize_distributed_env,
     launch_from_slurm,
     launch_from_torch,
+    try_bind_numa,
 )
 
 __all__ = [
@@ -12,4 +13,5 @@ __all__ = [
     "launch_from_slurm",
     "launch_from_torch",
     "initialize_distributed_env",
+    "try_bind_numa",
 ]
diff --git a/internlm/initialize/launch.py b/internlm/initialize/launch.py
index 3896ede..2da6afd 100644
--- a/internlm/initialize/launch.py
+++ b/internlm/initialize/launch.py
@@ -16,6 +16,16 @@ from internlm.utils.common import get_master_node
 from internlm.utils.logger import get_logger
 from internlm.utils.timeout import llm_timeout
 
+# check pacakge
+try:
+    import numa
+    from numa import memory, schedule
+    from pynvml.smi import nvidia_smi
+except (AttributeError, ImportError):
+    get_numa = False
+else:
+    get_numa = True
+
 logger = get_logger(__file__)
 
 
@@ -402,6 +412,8 @@ def launch_from_slurm(
     except KeyError as e:
         raise RuntimeError(f"Could not find {e} in the SLURM environment")
 
+    try_bind_numa(global_rank=rank, world_size=world_size)
+
     launch(
         config=config,
         rank=rank,
@@ -435,6 +447,8 @@ def launch_from_torch(
     except KeyError as e:
         raise RuntimeError(f"Could not find {e} in the torch environment")
 
+    try_bind_numa(global_rank=rank, world_size=world_size, local_rank=local_rank)
+
     launch(
         config=config,
         local_rank=local_rank,
@@ -464,6 +478,7 @@ def initialize_distributed_env(
         master_port (str): The master port for distributed training. 8888 by default.
         seed (int, optional): Specified random seed for every process. 1024 by default.
     """
+
     # close automatic garbage collection
     gc.disable()
 
@@ -485,13 +500,14 @@ def initialize_distributed_env(
         args_sanity_check()
 
     # init light monitor client
-    alert_config = gpc.config.monitor.alert
-    if alert_config.enable_feishu_alert and gpc.is_rank_for_log():
-        light_monitor_address = alert_config.light_monitor_address
-        if light_monitor_address:
-            initialize_light_monitor(light_monitor_address)
-        else:
-            logger.warning("monitor address is none, monitor could not be used!")
+    if gpc.config.get("monitor") and gpc.config.monitor.get("alert"):
+        alert_config = gpc.config.monitor.alert
+        if alert_config.enable_feishu_alert and gpc.is_rank_for_log():
+            light_monitor_address = alert_config.light_monitor_address
+            if light_monitor_address:
+                initialize_light_monitor(light_monitor_address)
+            else:
+                logger.warning("monitor address is none, monitor could not be used!")
 
 
 def get_config_value(config, key, defalut):
@@ -500,3 +516,45 @@ def get_config_value(config, key, defalut):
     except KeyError:
         value = defalut
     return value
+
+
+def try_bind_numa(global_rank, world_size, local_rank=None):
+    # Early return if numa module not available
+    if not get_numa:
+        if global_rank == 0:
+            logger.info(
+                "Try bind numa failed! Package import error, if numa is not installed, "
+                "please implement: pip install --upgrade py-libnuma, Ref: https://pypi.org/project/py-libnuma/"
+            )
+
+    # get numa node number
+    try:
+        numa_node_num = numa.info.get_max_node() + 1
+        # get total gpu number of current node
+        nvsmi = nvidia_smi.getInstance()
+        total_GPU_per_node = len(nvsmi.DeviceQuery("memory.total")["gpu"])
+
+        # return while total_GPU_per_node is larger than numa_node_num or is not divisible by numa_node_num
+        if total_GPU_per_node <= numa_node_num:
+            return
+        if total_GPU_per_node % numa_node_num != 0:
+            return
+        # return while the number of processes is smaller than one node GPUs num
+        if world_size < total_GPU_per_node:
+            return
+
+        if local_rank is None:
+            devices_per_node = torch.cuda.device_count()
+            local_rank = global_rank % devices_per_node
+
+        # compute numa id for each locak rank
+        per_numa = total_GPU_per_node // numa_node_num
+        numa_id = local_rank // per_numa
+
+        # bind numa node
+        schedule.run_on_nodes(numa_id)
+        memory.set_membind_nodes(numa_id)
+    except Exception:
+        return  # try_bind_numa should not raise exception
+    else:
+        logger.info(f"Rank: {global_rank} success bind process to numa node: {numa_id}")
diff --git a/internlm/model/modeling_moe.py b/internlm/model/modeling_moe.py
index d4539a0..43489bc 100644
--- a/internlm/model/modeling_moe.py
+++ b/internlm/model/modeling_moe.py
@@ -11,6 +11,7 @@ from torch import nn
 
 from internlm.core.context import IS_TENSOR_PARALLEL, ParallelMode
 from internlm.core.context.parallel_context import global_context as gpc
+from internlm.core.naive_amp import set_fp32_attr_to_module
 from internlm.initialize.initialize_tensor import normal_, scaled_init_method_normal
 from internlm.model.embedding import Embedding1D
 from internlm.model.linear import (
@@ -131,6 +132,8 @@ class PackedFlashBaseLayer1D(nn.Module):
             param.is_norm = True
         for param in self.norm2.parameters():
             param.is_norm = True
+        set_fp32_attr_to_module(self.norm1)
+        set_fp32_attr_to_module(self.norm2)
 
         self.num_experts = num_experts
         self.moe_gate_k = moe_gate_k
@@ -191,6 +194,7 @@ class PackedFlashBaseLayer1D(nn.Module):
             for _, param in self.mlp.moe_layer.experts.named_parameters():
                 if gpc.get_world_size(ParallelMode.TENSOR) > 1:
                     setattr(param, IS_TENSOR_PARALLEL, True)
+            set_fp32_attr_to_module(self.mlp.moe_layer.gate)
 
         self.dropout2 = nn.Dropout(drop_rate)
         self.use_swiglu = use_swiglu
@@ -433,6 +437,7 @@ class PackedFlashInternLm1D(nn.Module):
                 self.norm = RMSNorm(hidden_size, eps=layer_norm_epsilon)
             else:
                 self.norm = nn.LayerNorm(hidden_size, eps=layer_norm_epsilon)
+            set_fp32_attr_to_module(self.norm)
             self.head = head_cls(
                 in_features=hidden_size,
                 out_features=gpc.get_world_size(ParallelMode.TENSOR) if is_reward else vocab_size,
diff --git a/internlm/monitor/alert.py b/internlm/monitor/alert.py
index 1772e7f..e04aa0c 100644
--- a/internlm/monitor/alert.py
+++ b/internlm/monitor/alert.py
@@ -13,6 +13,19 @@ logger = get_logger(__file__)
 
 
 def initialize_light_monitor(monitor_address: str = None):
+    """
+    Initialize the lightweight monitoring module.
+
+    Args:
+        monitor_address (str, optional): The address of the monitor. Defaults to 'MONITOR_SERVER' environment.
+
+    Raises:
+        Exception: If any exceptions occur during initialization, they will be caught and logged as warnings.
+
+    Example:
+        Initialize the monitoring module with the default address
+        ``initialize_light_monitor()``
+    """
     try:
         from uniscale_monitoring import init_monitor
 
@@ -22,6 +35,24 @@ def initialize_light_monitor(monitor_address: str = None):
 
 
 def send_heartbeat(msg_type: str, msg: Dict):
+    """
+    Send a heartbeat message to a monitoring server.
+
+    Args:
+        msg_type (str): The type of heartbeat message, e.g., "train_metrics", "init_time", "stage_time".
+        msg (Dict): A dictionary containing message data to be included in the heartbeat.
+
+    Example:
+        Sending a heartbeat message for training metrics
+        ``send_heartbeat("train_metrics", {"loss": 0.1, "accuracy": 0.95})``
+
+        Sending a heartbeat message for initialization time
+        ``send_heartbeat("init_time", {"import_time": 0.25})``
+
+        Sending a heartbeat message for stage time
+        ``send_heartbeat("stage_time", {"fwd_time": 2.3, "bwd_time": 6.2})``
+    """
+
     def nan2none(v):
         if isinstance(v, float) and math.isnan(v):
             return None
diff --git a/internlm/monitor/utils.py b/internlm/monitor/utils.py
index f64c7dc..34360b5 100644
--- a/internlm/monitor/utils.py
+++ b/internlm/monitor/utils.py
@@ -14,8 +14,10 @@ def get_job_id():
     job_id = "none"
     if os.getenv("SLURM_JOB_ID") is not None:
         job_id = os.getenv("SLURM_JOB_ID")
-    elif os.getenv("K8S_WORKSPACE_ID") is not None:
-        job_id = os.getenv("K8S_WORKSPACE_ID")
+    elif os.getenv("KUBERNETES_POD_NAME") is not None:
+        job_id = os.getenv("KUBERNETES_POD_NAME").split("-")[0]
+    elif os.getenv("MLP_TASK_INSTANCE_ID") is not None:
+        job_id = os.getenv("MLP_TASK_ID")
 
     return job_id
 
diff --git a/internlm/train/training_internlm.py b/internlm/train/training_internlm.py
index 1f77ef2..46121e4 100644
--- a/internlm/train/training_internlm.py
+++ b/internlm/train/training_internlm.py
@@ -110,11 +110,7 @@ def initialize_optimizer(model: Union[nn.Module, nn.ModuleList]):
         param_bcast_sync_handler = None
 
     adam_cfg = gpc.config.adam
-    # split the moe parameters into different groups
-    if hasattr(gpc.config.model, "num_experts") and gpc.config.model.num_experts > 1:
-        params = create_param_groups(model, adam_cfg.weight_decay)
-    else:
-        params = [{"params": model.parameters(), "weight_decay": adam_cfg.weight_decay}]
+    params = create_param_groups(model, adam_cfg.weight_decay)
     naive_optimizer = torch.optim.AdamW(
         params=params,
         lr=adam_cfg.lr,
diff --git a/internlm/train/utils.py b/internlm/train/utils.py
index be69880..0e19398 100644
--- a/internlm/train/utils.py
+++ b/internlm/train/utils.py
@@ -7,14 +7,13 @@ from internlm.model.utils import is_gate_param, is_moe_param, is_norm_param
 
 
 def split_params_into_different_groups_for_optimizer(param_groups: Tuple[Dict]) -> Tuple[Dict]:
-    """Split parameters into different MoE groups for optimizer
+    """Split parameters into different groups for optimizer
 
     Args:
         param_groups (Tuple[Dict]): The list of parameter groups to split
         Input Example:
         >>> (
         >>>     {'name': 'default', 'params': [tensor], 'weight_decay' :xxx},
-        >>>     ...,
         >>> )
 
     Returns:
@@ -22,10 +21,10 @@ def split_params_into_different_groups_for_optimizer(param_groups: Tuple[Dict])
         Output Example:
         >>> (
         >>>     {'name': 'default','params': [tensor],'weight_decay' :xxx},
+        >>>     {'name': 'fp32', 'params': [tensor],'weight_decay' :xxx},
         >>>     {'name': 'norm', 'norm': True, 'params': [tensor],'weight_decay' :xxx},
         >>>     {'name': 'gate', 'gate': True, 'params': [tensor],'weight_decay' :xxx},
         >>>     {'name': 'moe_ep_size_4', 'moe': True, 'params':  [tensor],'weight_decay' :xxx},
-        >>>     ...,
         >>> )
     """
 
@@ -39,31 +38,38 @@ def split_params_into_different_groups_for_optimizer(param_groups: Tuple[Dict])
     # create new groups for fp32, norm, moe gate and moe expert
     new_groups = {}
     new_groups["fp32"] = {"name": "fp32", "params": []}
-    for key in ["gate", "norm"]:
-        new_groups[key] = {"name": key, key: True, "params": []}
-    for key in gpc.expert_parallel_group_names:
-        new_groups[key] = {"name": key, "moe": True, "params": []}
+    if gpc.config.model.num_experts > 1:
+        # norm and gate are special group to force sync (when enable MoE).
+        for key in ["gate", "norm"]:
+            new_groups[key] = {"name": key, key: True, "params": []}
+        for key in gpc.expert_parallel_group_names:
+            new_groups[key] = {"name": key, "moe": True, "params": []}
 
     for pgroup in param_groups:
-        # copy attribute from origin group
+        # copy attribute from origin group, we assume the input param_groups only
+        # have one group, so the attribute will not be copyed multiple times.
         for ori_key in pgroup.keys():
             if ori_key not in ("name", "params"):
                 for _, group in new_groups.items():
                     group[ori_key] = pgroup[ori_key]
         # assign param
         origin_params = []
-        # first split the norm and gate groups, then the fp32 group, finally moe group
+        # first split the norm and gate groups, which are special case to force sync (when enable MoE),
+        # then fp32 group and the moe group.
         for param in pgroup["params"]:
-            if is_norm_param(param):
+            if gpc.config.model.num_experts > 1 and is_norm_param(param):
                 new_groups["norm"]["params"].append(param)
+            # gate param means MoE is enabled
             elif is_gate_param(param):
                 new_groups["gate"]["params"].append(param)
             elif param.dtype == torch.float32:
                 new_groups["fp32"]["params"].append(param)
+            # moe param means MoE is enabled
             elif is_moe_param(param):
                 new_groups[param.group_name]["params"].append(param)
             else:
                 origin_params.append(param)
+
         # bf16 param group, which is the first group in the param groups
         pgroup["params"] = origin_params
 
diff --git a/internlm/utils/model_checkpoint.py b/internlm/utils/model_checkpoint.py
index 0377d58..3dd57eb 100644
--- a/internlm/utils/model_checkpoint.py
+++ b/internlm/utils/model_checkpoint.py
@@ -81,8 +81,8 @@ class CheckpointLoadMethod:
 
     @staticmethod
     def register_ckpt_load_type(load_type: Union[str, CheckpointLoadType], load_func: Callable):
-        if load_type in CheckpointLoadMethod.LOAD_TYPE_FUNC:
-            logger.warning(f"{load_type} has aleady been registed!")
+        if load_type in CheckpointLoadMethod.LOAD_TYPE_FUNC and gpc.is_rank_for_log():
+            logger.warning(f"{load_type} has already been registered!")
             return
 
         CheckpointLoadMethod.LOAD_TYPE_FUNC.update({load_type: load_func})
@@ -90,9 +90,10 @@ class CheckpointLoadMethod:
         if load_type == CheckpointLoadType.INTERNLM:
             CheckpointLoadMethod.LOAD_FUNC_SIG = inspect.signature(load_func)
         else:
-            if inspect.signature(load_func) != CheckpointLoadMethod.LOAD_FUNC_SIG:
+            if inspect.signature(load_func) != CheckpointLoadMethod.LOAD_FUNC_SIG and gpc.is_rank_for_log():
                 logger.warning(
-                    f"registe load model ckpt signature is not same with: {CheckpointLoadMethod.LOAD_FUNC_SIG}"
+                    f"The registered signature {inspect.signature(load_func)} of the loaded model is not same as: "
+                    f"{CheckpointLoadMethod.LOAD_FUNC_SIG}"
                 )
 
     @staticmethod
@@ -466,10 +467,11 @@ def load_optimizer_checkpoint(folder, optim):
             zero_devide_optim_plan = llm_load(fp_meta)
             states.update({"zero_devide_optim_plan": zero_devide_optim_plan})
         except Exception as e:
-            logger.warning(
-                f"Read zero optimzer split file '{fp_meta}', for '{e}'"
-                f"Please check whether loading ckpts are saved with the HybridZeroOptimizer."
-            )
+            if gpc.is_rank_for_log():
+                logger.warning(
+                    f"Read zero optimzer split file '{fp_meta}', for '{e}'"
+                    f"Please check whether loading ckpts are saved with the HybridZeroOptimizer."
+                )
 
     optim.load_state_dict(states)
     del states
@@ -481,8 +483,8 @@ def load_sampler(ckpt_path: str, sampler):
     sampler.load_state_dict(sampler_states)
     if gpc.is_rank_for_log():
         pstate = copy.deepcopy(sampler_states)
-        pstate.pop("indices")
-        pstate.pop("rng_state")
+        pstate.pop("indices", None)
+        pstate.pop("rng_state", None)
         logger.info(f"reload sampler_states:{pstate}")
     torch.cuda.empty_cache()
 
@@ -731,9 +733,12 @@ now step_count is {train_state.step_count}",
         # Here we only try to find the ckpt folder named after step, ignoring snapshot and other folders.
         ckpt_list = [int(fn.strip("/")) for fn in ckpt_list if fn.strip("/").isdigit()]
         if len(ckpt_list) == 0:
-            logger.warning("Not found avaliable normal checkpoint!")
+            if gpc.is_rank_for_log():
+                logger.warning("No available normal checkpoint found. Check your checkpoint path.")
         else:
-            logger.info(f"Found avaliable normal checkpoint: {ckpt_list}!")
+            if gpc.is_rank_for_log():
+                logger.info(f"Found available normal checkpoint: {ckpt_list}")
+
             ckpt_list.sort(reverse=True)
             for ckpt in ckpt_list:
                 fns_list = self.storage_manager.get_fns(os.path.join(self.save_ckpt_folder, str(ckpt)))
diff --git a/internlm/utils/storage_manager.py b/internlm/utils/storage_manager.py
index 36bd105..a3f9122 100644
--- a/internlm/utils/storage_manager.py
+++ b/internlm/utils/storage_manager.py
@@ -166,19 +166,18 @@ def compute_file_md5_by_chunk(file_name: str):
 
 
 def try_get_storage_backend(path: str):
-    sre = path.split(":", maxsplit=1)
-    if len(sre) == 1:
-        if path.startswith("s3:"):
-            backend = "boto3"
-            if gpc.is_rank_for_log():
-                logger.warning(f"path: '{path}' not start with backend prefix, guess it is the backend of boto3.")
-        else:
-            backend = "local"
+    if path.startswith("s3:"):
+        if gpc.is_rank_for_log():
+            logger.warning(f"path: '{path}' not start with backend prefix, guess it is the backend of boto3.")
+        return "boto3", path
+    else:
+        sre = path.split(":", maxsplit=1)
+        if len(sre) == 1:
             if gpc.is_rank_for_log():
                 logger.warning(f"path: '{path}' not start with backend prefix, guess it is the backend of local.")
-        return backend, sre
-    else:
-        return sre[0], sre[1]  # (backend_prefix, splited_path)
+            return "local", sre[0]
+        else:
+            return sre[0], sre[1]  # (backend_prefix, splited_path)
 
 
 class Boto3Client(StorageClient):
@@ -502,7 +501,7 @@ class StorageManager(metaclass=SingletonMeta):
                 or "HTTP_PROXY" in os.environ
                 or "HTTPS_PROXY" in os.environ
             ):
-                if not self.has_warning:
+                if not self.has_warning and gpc.is_rank_for_log():
                     logger.warning(
                         "HTTP/HTTPS proxy is detected when using boto3, incorrectly setting \
     the proxy may make boto3 unavailable or affect performance."
diff --git a/requirements/runtime.txt b/requirements/runtime.txt
index f46d7ad..2fbef4a 100644
--- a/requirements/runtime.txt
+++ b/requirements/runtime.txt
@@ -13,4 +13,5 @@ boto3
 botocore
 torch-scatter
 pyecharts
+py-libnuma
 -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
\ No newline at end of file
diff --git a/tests/test_model/test_fused_precision/test_fused_precision.py b/tests/test_model/test_fused_precision/test_fused_precision.py
new file mode 100644
index 0000000..e368813
--- /dev/null
+++ b/tests/test_model/test_fused_precision/test_fused_precision.py
@@ -0,0 +1,128 @@
+import multiprocessing as mp
+from functools import partial
+
+import pytest
+import torch
+from torch import nn
+
+from internlm.core.naive_amp import NaiveAMPModel, set_fp32_attr_to_module
+from internlm.model.modeling_internlm import PackedFlashBaseLayer1D
+from internlm.train.utils import create_param_groups
+from tests.test_model.test_model_internlm import build_environment, seed_all
+
+
+def _pre_forward_hook_for_check(model, inputs):  # pylint: disable=W0613
+    assert all(_.dtype == torch.float32 for _ in inputs)
+
+
+def _post_forward_hook_for_check(model, inputs, outputs):  # pylint: disable=W0613
+    if isinstance(outputs, tuple):
+        assert all(_.dtype == torch.half for _ in outputs)
+    else:
+        assert outputs.dtype == torch.half
+
+
+def check_fused_precision(args):
+    # init
+    rank, world_size = args
+    device = torch.device("cuda")
+    build_environment(rank, world_size)
+
+    # fix seed
+    seed_all(1024)
+    # define model
+    model = PackedFlashBaseLayer1D(
+        hidden_size=16,  # 768
+        num_attention_heads=2,  # 12
+        mlp_ratio=2,
+        attn_drop_rate=0.0,
+        drop_rate=0.0,
+        dtype=torch.bfloat16,
+        layer_norm_epsilon=1e-5,
+        checkpoint=False,
+        layer_idx=0,
+        residual_in_fp32=False,
+        device=device,
+        norm_type="rmsnorm",
+        dropout_selective_checkpoint=True,
+        use_scaled_init=True,
+        use_swiglu=True,
+    )
+    model = model.to(device)
+    set_fp32_attr_to_module(model.norm1)
+    model = NaiveAMPModel(
+        model=model,
+        output_to_fp32=True,
+        dtype=torch.half,
+        sync_buffer=False,
+    )
+    model.model.norm1.register_forward_pre_hook(partial(_pre_forward_hook_for_check))
+    model.model.norm1.register_forward_hook(partial(_post_forward_hook_for_check))
+
+    hidden_states = torch.rand(1, 1, 16).to(device).requires_grad_()
+
+    # forward
+    model(hidden_states)
+
+
+class MlpModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.linear1 = nn.Linear(4, 1, bias=False).half()
+        self.linear2 = nn.Linear(1, 4, bias=False).float()
+
+
+def check_split_fused_group(args):
+    # init
+    rank, world_size = args
+    device = torch.device("cuda")
+    build_environment(rank, world_size)
+    rtol, atol = (1e-3, 5e-3)
+
+    # fix seed
+    seed_all(1024)
+    # define model
+    model = MlpModel().to(device)
+    groups = create_param_groups(model, weight_decay=0.05)
+
+    standard_group = (
+        {
+            "name": "default",
+            "params": [torch.Tensor([[0.3088, 0.2935, -0.2900, 0.4280]]).to(torch.float16).to(device).requires_grad_()],
+            "weight_decay": 0.05,
+        },
+        {
+            "name": "fp32",
+            "params": [torch.Tensor([[0.6273], [0.4844], [-0.0463], [-0.0090]]).to(device).requires_grad_()],
+            "weight_decay": 0.05,
+        },
+    )
+
+    # check groups params
+    for t1, t2 in zip(groups, standard_group):
+        # assert t1["name"] == t2["name"]
+        assert all(
+            torch.allclose(p1, p2, rtol=rtol, atol=atol, equal_nan=True) for p1, p2 in zip(t1["params"], t2["params"])
+        )
+
+
+@pytest.mark.fused_precision
+def test_fused_precision():
+    ctx = mp.get_context("spawn")
+    with ctx.Pool(processes=8) as pool:
+        pool.map(check_fused_precision, [[rank, 8] for rank in range(8)])
+        pool.close()
+        pool.join()
+
+
+@pytest.mark.split_groups
+def test_split_fused_groups():
+    ctx = mp.get_context("spawn")
+    with ctx.Pool(processes=8) as pool:
+        pool.map(check_split_fused_group, [[rank, 8] for rank in range(8)])
+        pool.close()
+        pool.join()
+
+
+if __name__ == "__main__":
+    pytest.main(["-s", "-q", "test_norm.py"])
diff --git a/tests/test_utils/test_storage_manager.py b/tests/test_utils/test_storage_manager.py
index 32f905b..e5f60c4 100644
--- a/tests/test_utils/test_storage_manager.py
+++ b/tests/test_utils/test_storage_manager.py
@@ -87,3 +87,24 @@ def test_storage_mm_save_load(ckpt_config, init_dist_and_model):  # noqa # pylin
     assert get_fns(ckpt_config.save_folder)[0] == "test.pt"
     load_obj = llm_load(save_fn, map_location="cpu")
     assert 0 == ((load_obj != tobj).sum())
+
+
+internlm_ckpt_path = [
+    ("local:/mnt/ckpt/", "local", "/mnt/ckpt/"),
+    ("local:./ckpt/", "local", "./ckpt/"),
+    ("boto3:s3://oss_bucket/", "boto3", "s3://oss_bucket/"),
+    ("boto3:oss_bucket/", "boto3", "oss_bucket/"),
+    ("/mnt/ckpt/", "local", "/mnt/ckpt/"),
+    ("./ckpt/", "local", "./ckpt/"),
+    ("s3://oss_bucket/", "boto3", "s3://oss_bucket/"),
+]
+
+
+@pytest.mark.parametrize("ckpt_path", internlm_ckpt_path)
+def test_try_get_storage_backend(ckpt_path):
+    from internlm.utils.storage_manager import try_get_storage_backend
+
+    ipath, a_prefix, a_cut_path = ckpt_path
+    b_prefix, b_cut_path = try_get_storage_backend(ipath)
+    assert a_prefix == b_prefix, f"{a_prefix} == {b_prefix}"
+    assert a_cut_path == b_cut_path, f"{a_cut_path} == {b_cut_path}"