mirror of https://github.com/InternLM/InternLM
doc(readme): update 7b/20b chat model information (#537)
* update chat model information in README * modifications by pre-commit hook * update 7b evaluation results * fix readmepull/551/head
parent
3028f07cb7
commit
68d6abc64a
|
@ -43,7 +43,8 @@ InternLM は、70 億のパラメータを持つベースモデルと、実用
|
|||
|
||||
## 新闻
|
||||
|
||||
InternLM-7B-Chat v1.1 は、コード インタプリタと関数呼び出し機能を備えてリリースされました。 [Lagent](https://github.com/InternLM/lagent) で試すことができます。
|
||||
[20231213] InternLM-7B-Chat および InternLM-20B-Chat のモデルの重みを更新しました。 新しいバージョンの会話モデルでは、より高品質でより多様な言語スタイルの応答を生成できます。
|
||||
[20230920] 基本版と会話版を含むInternLM-20Bをリリースしました。
|
||||
|
||||
## InternLM-7B
|
||||
|
||||
|
@ -51,18 +52,19 @@ InternLM-7B-Chat v1.1 は、コード インタプリタと関数呼び出し機
|
|||
|
||||
オープンソースの評価ツール [OpenCompass](https://github.com/internLM/OpenCompass/) を用いて、InternLM の総合的な評価を行った。この評価では、分野別能力、言語能力、知識能力、推論能力、理解能力の 5 つの次元をカバーしました。以下は評価結果の一部であり、その他の評価結果については [OpenCompass leaderboard](https://opencompass.org.cn/rank) をご覧ください。
|
||||
|
||||
|
||||
| データセット\モデル | **InternLM-Chat-7B** | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| ---------------- | -------------------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
| --------------- | -------------------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 52.0 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 52.6 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 46.4 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 80.8 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 80.6 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 81.8 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 5.0 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 36.2 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 15.9 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 80.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
|
||||
- 評価結果は [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) (*印のあるデータは原著論文からの引用を意味する)から取得したもので、評価設定は [OpenCompass](https://github.com/internLM/OpenCompass/) が提供する設定ファイルに記載されています。
|
||||
- 評価データは、[OpenCompass](https://github.com/internLM/OpenCompass/) のバージョンアップにより数値的な差異が生じる可能性がありますので、[OpenCompass](https://github.com/internLM/OpenCompass/) の最新の評価結果をご参照ください。
|
||||
|
@ -75,7 +77,6 @@ InternLM 7B と InternLM 7B チャットは、InternLM を使って訓練され
|
|||
| ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
|
||||
| **InternLM 7B** | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | [🤗internlm/intern-7b](https://huggingface.co/internlm/internlm-7b) |
|
||||
| **InternLM Chat 7B** | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [🤗internlm/intern-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) |
|
||||
| **InternLM Chat 7B 8k** | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | [🤗internlm/intern-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) |
|
||||
|
||||
**制限事項:** 学習過程におけるモデルの安全性を確保し、倫理的・法的要件に準拠したテキストを生成するようモデルに促す努力を行ってきたが、モデルのサイズと確率的生成パラダイムのため、モデルは依然として予期せぬ出力を生成する可能性がある。例えば、生成された回答には偏見や差別、その他の有害な内容が含まれている可能性があります。そのような内容を伝播しないでください。有害な情報の伝播によって生じるいかなる結果に対しても、私たちは責任を負いません。
|
||||
|
||||
|
|
|
@ -44,8 +44,8 @@ InternLM 是一个开源的轻量级训练框架,旨在支持大模型训练
|
|||
|
||||
## 更新
|
||||
|
||||
[20231213] 我们更新了 InternLM-7B-Chat 和 InternLM-20B-Chat 模型权重。通过改进微调数据和训练策略,新版对话模型生成的回复质量更高、语言风格更加多元。
|
||||
[20230920] InternLM-20B 已发布,包括基础版和对话版。
|
||||
[20230822] InternLM-7B-Chat v1.1 已发布,增加了代码解释器和函数调用能力。您可以使用 [Lagent](https://github.com/InternLM/lagent) 进行尝试。
|
||||
|
||||
|
||||
## Model Zoo
|
||||
|
@ -54,12 +54,10 @@ InternLM 是一个开源的轻量级训练框架,旨在支持大模型训练
|
|||
|
||||
| Model | Transformers | ModelScope | OpenXLab |发布日期 |
|
||||
|---------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | 2023-09-20 |
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | 2023-12-12 |
|
||||
| **InternLM 20B** | [🤗internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b) | 2023-09-20 |
|
||||
| **InternLM Chat 7B v1.1** | [🤗internlm/internlm-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1) | 2023-08-22 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | 2023-12-12 |
|
||||
| **InternLM 7B** | [🤗internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | 2023-07-06 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | 2023-07-06 |
|
||||
| **InternLM Chat 7B 8k** | [🤗internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | 2023-07-06 |
|
||||
|
||||
|
||||
<details>
|
||||
|
@ -121,7 +119,6 @@ InternLM 20B 在模型结构上选择了深结构,InternLM-20B 的层数设定
|
|||
<summary> InternLM-7B </summary>
|
||||
|
||||
#### 模型更新
|
||||
[20230822] 通过使用更丰富的SFT类型数据,InternLM-7B-Chat v1.1模型支持代码解释和函数调用。模型结构与代码没有任何变化,因此可以使用与InternLM-7B-Chat完全一样的方式使用更强大的InternLM-7B-Chat v1.1。
|
||||
|
||||
#### 简介
|
||||
InternLM-7B 包含了一个拥有70亿参数的基础模型和一个为实际场景量身定制的对话模型。该模型具有以下特点:
|
||||
|
@ -135,17 +132,17 @@ InternLM-7B 包含了一个拥有70亿参数的基础模型和一个为实际场
|
|||
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[OpenCompass 榜单](https://opencompass.org.cn/rank)获取更多的评测结果。
|
||||
|
||||
| 数据集\模型 | **InternLM-Chat-7B** | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| -------------------- | --------------------- | ---------------- | --------- | --------- | ------------ | --------- | ---------- |
|
||||
| C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
| --------------- | -------------------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 52.0 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 52.6 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 46.4 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 80.8 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 80.6 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 81.8 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 5.0 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 36.2 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 15.9 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 80.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
|
||||
- 以上评测结果基于 [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
|
||||
- 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
|
||||
|
|
29
README.md
29
README.md
|
@ -44,8 +44,8 @@ Based on the InternLM training framework, we have released two open-sourced pret
|
|||
|
||||
## News
|
||||
|
||||
[20231213] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responses with greater stylistic diversity.
|
||||
[20230920] InternLM-20B is released with base and chat versions.
|
||||
[20230822] InternLM-7B-Chat v1.1 is released with code interpreter and function calling capability. You can try it with [Lagent](https://github.com/InternLM/lagent).
|
||||
|
||||
|
||||
## Model Zoo
|
||||
|
@ -57,12 +57,10 @@ Our models are released in three platforms: Transformers, ModelScope and OpenXLa
|
|||
|
||||
| Model | Transformers(HF) | ModelScope(HF) | OpenXLab(HF) | OpenXLab(Original) | Release Date |
|
||||
|---------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b-original) | 2023-09-20 |
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b-original) | 2023-12-12 |
|
||||
| **InternLM 20B** | [🤗internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b-original) | 2023-09-20 |
|
||||
| **InternLM Chat 7B v1.1** | [🤗internlm/internlm-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1-original) | 2023-08-22 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-original) | 2023-12-12 |
|
||||
| **InternLM 7B** | [🤗internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b-original) | 2023-07-06 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-original) | 2023-07-06 |
|
||||
| **InternLM Chat 7B 8k** | [🤗internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k-original) | 2023-07-06 |
|
||||
|
||||
#### Introduction
|
||||
InternLM-20B was pre-trained on over **2.3T** Tokens containing high-quality English, Chinese, and code data. Additionally, the Chat version has undergone SFT and RLHF training, enabling it to better and more securely meet users' needs.
|
||||
|
@ -120,7 +118,6 @@ Overall, InternLM-20B comprehensively outperforms open-source models in the 13B
|
|||
<summary> InternLM-7B </summary>
|
||||
|
||||
#### News
|
||||
[20230822] By utilizing richer SFT-type data, the InternLM-7B-Chat v1.1 model supports code interpretation and function invocation. The model structure and code remain unchanged, so the more powerful InternLM-7B-Chat v1.1 can be used in exactly the same way as InternLM-7B-Chat.
|
||||
|
||||
#### Introduction
|
||||
InternLM-7B contains a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
|
||||
|
@ -135,16 +132,16 @@ We conducted a comprehensive evaluation of InternLM using the open-source evalua
|
|||
|
||||
| Datasets\Models | **InternLM-Chat-7B** | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| --------------- | -------------------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
| C-Eval(Val) | 52.0 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 52.6 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 46.4 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 80.8 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 80.6 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 81.8 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 5.0 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 36.2 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 15.9 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 80.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
|
||||
- The evaluation results were obtained from [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
|
||||
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
|
||||
|
|
|
@ -103,4 +103,3 @@ msgstr ""
|
|||
|
||||
#~ msgid "traning dataloader object"
|
||||
#~ msgstr ""
|
||||
|
||||
|
|
|
@ -47,4 +47,3 @@ msgstr "Training Results"
|
|||
#: ../../source/example/30B_demo.rst:175 615a3481b0aa49729b7219b1365519aa
|
||||
msgid "基于以上训练配置和启动命令,两节点 16GPU 下的模型训练部分日志展示如下:"
|
||||
msgstr "Taking the configuration of the demo training on two nodes with 16 GPUs on slurm as an example, the training result log is shown below:"
|
||||
|
||||
|
|
|
@ -47,4 +47,3 @@ msgstr "Training Results"
|
|||
#: ../../source/example/7B_demo.rst:173 33ec81f34e3c4340beacdb5254069d08
|
||||
msgid "基于以上训练配置和启动命令,单节点 8GPU 下的模型训练部分日志展示如下:"
|
||||
msgstr "Taking the configuration of the demo training on a single machine with 8 GPUs on slurm as an example, the training result log is shown below:"
|
||||
|
||||
|
|
|
@ -30,4 +30,3 @@ msgstr ""
|
|||
#: ../../source/example/index.rst:13 b095e27dfc924a7a943b7cba5361700a
|
||||
msgid "30B Demo"
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -78,4 +78,3 @@ msgstr ""
|
|||
#: ../../source/index.rst:95 a164b772960f4ab8b18c7e8820f69f55
|
||||
msgid ":ref:`search`"
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -245,4 +245,3 @@ msgid ""
|
|||
"A tuple of ``(trainer, train_dataloader, test_dataloader, lr_scheduler)``"
|
||||
" where only ``trainer`` could not be None."
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -137,4 +137,3 @@ msgstr "For the local standard image built with dockerfile or pulled, use the fo
|
|||
#: ../../../install.md:87 66613606256e4094a6be5ab2af1269ae
|
||||
msgid "容器内默认目录即 `/InternLM`,根据[使用文档](./usage.md)即可启动训练。"
|
||||
msgstr "The default directory in the container is `/InternLM`, please start training according to the [Usage](./usage.md)."
|
||||
|
||||
|
|
|
@ -195,4 +195,3 @@ msgstr ""
|
|||
#: internlm.monitor.alert.send_feishu_msg_with_webhook:12 of
|
||||
msgid "An exception rasied by the HTTP post request."
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -454,4 +454,3 @@ msgstr ""
|
|||
#: internlm.solver.optimizer.hybrid_zero_optim.HybridZeroOptimizer.step:7 of
|
||||
msgid "Whether the gradient is success updated, and the gradient."
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -172,4 +172,3 @@ msgstr ""
|
|||
#: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.step:1 of
|
||||
msgid "Update the memory state of the optimizer state."
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -22,4 +22,3 @@ msgstr ""
|
|||
#: ../../source/qa.rst:2 e3b22a39640a40cfb527068a7f4bbfc9
|
||||
msgid "问&答"
|
||||
msgstr "Q&A"
|
||||
|
||||
|
|
|
@ -159,4 +159,3 @@ msgstr ""
|
|||
|
||||
#~ msgid "InternLM训练流程图"
|
||||
#~ msgstr "InternLM training process"
|
||||
|
||||
|
|
|
@ -364,4 +364,3 @@ msgstr ""
|
|||
#~ msgstr ""
|
||||
#~ "`load_model_only_folder` and `load_ckpt_folder` "
|
||||
#~ "cannot be set at the same time."
|
||||
|
||||
|
|
|
@ -90,4 +90,3 @@ When `Activation Ckpt` is turned off, the test results are as shown in the table
|
|||
<div align="left">
|
||||
<img src="../imgs/flops.png" width="580"/>
|
||||
</div>
|
||||
|
||||
|
|
|
@ -87,4 +87,3 @@ InternLM中`zero1`的配置决定了优化器状态的分配范围。
|
|||
<div align="left">
|
||||
<img src="../doc/imgs/flops.png" width="580"/>
|
||||
</div>
|
||||
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
# flake8: noqa
|
||||
|
||||
# This file is modified from:
|
||||
# hhttps://github.com/reasoning-machines/pal/blob/main/pal/core/interface.py
|
||||
#
|
||||
|
@ -27,8 +29,8 @@ import tqdm
|
|||
from datasets import load_dataset
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
from tools.transformers.interface import GenerationConfig, generate_interactive
|
||||
from internlm.utils.timeout import Timeout
|
||||
from tools.transformers.interface import GenerationConfig, generate_interactive
|
||||
|
||||
|
||||
def parse_args():
|
||||
|
|
|
@ -19,9 +19,8 @@
|
|||
# limitations under the License.
|
||||
""" InternLM model configuration"""
|
||||
|
||||
from transformers.utils import logging
|
||||
from transformers.configuration_utils import PretrainedConfig
|
||||
|
||||
from transformers.utils import logging
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
@ -30,9 +29,9 @@ INTERNLM_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
|
|||
|
||||
class InternLMConfig(PretrainedConfig):
|
||||
r"""
|
||||
This is the configuration class to store the configuration of a [`InternLMModel`]. It is used to instantiate an InternLM
|
||||
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
|
||||
defaults will yield a similar configuration to that of the InternLM-7B.
|
||||
This is the configuration class to store the configuration of a [`InternLMModel`]. It is used to instantiate an
|
||||
InternLM model according to the specified arguments, defining the model architecture. Instantiating a
|
||||
configuration with the defaults will yield a similar configuration to that of the InternLM-7B.
|
||||
|
||||
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
|
||||
documentation from [`PretrainedConfig`] for more information.
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
import argparse
|
||||
import math
|
||||
import json
|
||||
import math
|
||||
import os
|
||||
import re
|
||||
import tempfile
|
||||
|
@ -110,7 +110,7 @@ def merge_pp(states_tp_pp):
|
|||
states = states_tp_pp[tp][pp]
|
||||
keys = list(states.keys())
|
||||
for key in keys:
|
||||
match = re.search("\.\d+\.", key)
|
||||
match = re.search("\.\d+\.", key) # noqa: W605
|
||||
if match is not None:
|
||||
s, e = match.span()
|
||||
layer_idx = int(key[s + 1 : e - 1]) + layer_shift
|
||||
|
@ -126,9 +126,9 @@ def merge_pp(states_tp_pp):
|
|||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--src_folder', type=str, default='~/test/') # 需要转换为hf格式的checkpoint文件夹
|
||||
parser.add_argument('--tgt_folder', type=str, default='~/output/') # 存放转换后checkpoint的目标文件夹
|
||||
parser.add_argument('--tokenizer', type=str, default='~/test/tokenizer.model') # Tokenizer 文件的路径
|
||||
parser.add_argument("--src_folder", type=str, default="~/test/") # 需要转换为hf格式的checkpoint文件夹
|
||||
parser.add_argument("--tgt_folder", type=str, default="~/output/") # 存放转换后checkpoint的目标文件夹
|
||||
parser.add_argument("--tokenizer", type=str, default="~/test/tokenizer.model") # Tokenizer 文件的路径
|
||||
args = parser.parse_args()
|
||||
|
||||
def load(fp):
|
||||
|
|
|
@ -5,7 +5,6 @@ from typing import Callable, List, Optional
|
|||
|
||||
import torch
|
||||
from torch import nn
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
from transformers.generation.utils import LogitsProcessorList, StoppingCriteriaList
|
||||
from transformers.utils import logging
|
||||
|
||||
|
@ -38,12 +37,12 @@ def generate_interactive(
|
|||
for k, v in inputs.items():
|
||||
inputs[k] = v.cuda()
|
||||
input_ids = inputs["input_ids"]
|
||||
batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
|
||||
batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1] # noqa: F841
|
||||
if generation_config is None:
|
||||
generation_config = model.generation_config
|
||||
generation_config = copy.deepcopy(generation_config)
|
||||
model_kwargs = generation_config.update(**kwargs)
|
||||
bos_token_id, eos_token_id = generation_config.bos_token_id, generation_config.eos_token_id
|
||||
bos_token_id, eos_token_id = generation_config.bos_token_id, generation_config.eos_token_id # noqa: F841
|
||||
if isinstance(eos_token_id, int):
|
||||
eos_token_id = [eos_token_id]
|
||||
if additional_eos_token_id is not None:
|
||||
|
@ -119,9 +118,7 @@ def generate_interactive(
|
|||
|
||||
# update generated ids, model inputs, and length for next step
|
||||
input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
|
||||
model_kwargs = model._update_model_kwargs_for_generation(
|
||||
outputs, model_kwargs, is_encoder_decoder=False
|
||||
)
|
||||
model_kwargs = model._update_model_kwargs_for_generation(outputs, model_kwargs, is_encoder_decoder=False)
|
||||
unfinished_sequences = unfinished_sequences.mul((min(next_tokens != i for i in eos_token_id)).long())
|
||||
|
||||
output_token_ids = input_ids[0].cpu().tolist()
|
||||
|
|
|
@ -1,11 +1,13 @@
|
|||
import torch
|
||||
from moss_002_sft import collate_fn, get_dataset
|
||||
from peft import LoraConfig, TaskType, get_peft_model
|
||||
from torch.utils.data import DataLoader
|
||||
from peft import get_peft_model, LoraConfig, TaskType
|
||||
from transformers import get_linear_schedule_with_warmup
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
from tqdm import tqdm
|
||||
|
||||
from moss_002_sft import get_dataset, collate_fn
|
||||
from transformers import (
|
||||
AutoModelForCausalLM,
|
||||
AutoTokenizer,
|
||||
get_linear_schedule_with_warmup,
|
||||
)
|
||||
|
||||
model_path = "model_path"
|
||||
data_dir = "moss_002_sft"
|
||||
|
@ -16,8 +18,11 @@ epochs = 5
|
|||
val_per_steps = 1000
|
||||
lr = 9e-6
|
||||
peft_config = LoraConfig(
|
||||
task_type=TaskType.CAUSAL_LM, r=32, lora_alpha=32, lora_dropout=0.1,
|
||||
target_modules=["gate_proj", "down_proj", "up_proj", "q_proj", "k_proj", "v_proj", "o_proj"]
|
||||
task_type=TaskType.CAUSAL_LM,
|
||||
r=32,
|
||||
lora_alpha=32,
|
||||
lora_dropout=0.1,
|
||||
target_modules=["gate_proj", "down_proj", "up_proj", "q_proj", "k_proj", "v_proj", "o_proj"],
|
||||
)
|
||||
|
||||
|
||||
|
@ -29,12 +34,12 @@ model.cuda()
|
|||
|
||||
# dataset
|
||||
train_dataset, val_dataset = get_dataset(tokenizer, data_dir, num=data_num, test_size=test_size)
|
||||
train_dataloader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True, collate_fn=lambda x: collate_fn(x, tokenizer))
|
||||
train_dataloader = DataLoader(
|
||||
train_dataset, batch_size=train_batch_size, shuffle=True, collate_fn=lambda x: collate_fn(x, tokenizer)
|
||||
)
|
||||
|
||||
optimizer = torch.optim.AdamW(model.parameters(), lr)
|
||||
scheduler = get_linear_schedule_with_warmup(
|
||||
optimizer, 1000, epochs * len(train_dataloader)
|
||||
)
|
||||
scheduler = get_linear_schedule_with_warmup(optimizer, 1000, epochs * len(train_dataloader))
|
||||
|
||||
# train
|
||||
fp = open("output", "w")
|
||||
|
@ -58,7 +63,15 @@ for epoch in tqdm(range(epochs), desc="Traning Epoch"):
|
|||
data, label = val_dataset[i]
|
||||
prefix = tokenizer.decode(data.tolist(), skip_special_tokens=True)
|
||||
try:
|
||||
generate = model.generate(input_ids=data.unsqueeze(0).cuda(), temperature=0.7, top_k=50, do_sample=True, repetition_penalty=1.02, max_new_tokens=100, top_p=0.9)
|
||||
generate = model.generate(
|
||||
input_ids=data.unsqueeze(0).cuda(),
|
||||
temperature=0.7,
|
||||
top_k=50,
|
||||
do_sample=True,
|
||||
repetition_penalty=1.02,
|
||||
max_new_tokens=100,
|
||||
top_p=0.9,
|
||||
)
|
||||
text = tokenizer.decode(generate[0].tolist(), skip_special_tokens=True)
|
||||
text = text.replace(prefix, "")
|
||||
fp.write(f"Prefix: {prefix}\nGenerated: {text}" + "\n---------------------------------\n")
|
||||
|
|
|
@ -1,9 +1,11 @@
|
|||
import os
|
||||
import copy
|
||||
import os
|
||||
|
||||
import torch
|
||||
from datasets import Dataset as HFDataset
|
||||
from datasets import load_dataset
|
||||
from torch.utils.data import Dataset
|
||||
from datasets import load_dataset, Dataset as HFDataset
|
||||
|
||||
|
||||
class SFTDataset(Dataset):
|
||||
# https://github.com/OpenLMLab/MOSS/blob/main/finetune_moss.py
|
||||
|
@ -26,21 +28,25 @@ class SFTDataset(Dataset):
|
|||
|
||||
return data, label
|
||||
|
||||
|
||||
def collate_fn(batch, tokenizer):
|
||||
batch_input_ids, batch_labels = [], []
|
||||
for input_ids, label in batch:
|
||||
batch_input_ids.append(input_ids)
|
||||
batch_labels.append(label)
|
||||
|
||||
batch_input_ids = torch.nn.utils.rnn.pad_sequence(batch_input_ids, batch_first=True, padding_value=tokenizer.eos_token_id)
|
||||
batch_input_ids = torch.nn.utils.rnn.pad_sequence(
|
||||
batch_input_ids, batch_first=True, padding_value=tokenizer.eos_token_id
|
||||
)
|
||||
batch_labels = torch.nn.utils.rnn.pad_sequence(batch_labels, batch_first=True, padding_value=-100)
|
||||
|
||||
return {
|
||||
"input_ids": batch_input_ids,
|
||||
"attention_mask": (batch_input_ids == tokenizer.eos_token_id).long(),
|
||||
"labels": batch_labels
|
||||
"labels": batch_labels,
|
||||
}
|
||||
|
||||
|
||||
def process(sample, tokenizer, max_len):
|
||||
chat = sample["plain_text"].split("<eoa>")[:-1]
|
||||
num_turns = sample["num_turns"]
|
||||
|
@ -81,7 +87,7 @@ def load_data(save_dir, tokenizer, max_len, num=-1) -> HFDataset:
|
|||
if os.path.exists(save_dir):
|
||||
print(f"Loading moss-002-sft from {save_dir}")
|
||||
else:
|
||||
print(f"Loading moss-002-sft from datasets")
|
||||
print("Loading moss-002-sft from datasets")
|
||||
moss_sft = load_dataset("fnlp/moss-002-sft-data", split="train")
|
||||
moss_sft = moss_sft.map(lambda x: process(x, tokenizer, max_len), num_proc=10)
|
||||
moss_sft = moss_sft.filter(lambda x: len(x["input_ids"]) != 0)
|
||||
|
@ -90,11 +96,11 @@ def load_data(save_dir, tokenizer, max_len, num=-1) -> HFDataset:
|
|||
moss_sft = HFDataset.load_from_disk(save_dir)
|
||||
if num != -1:
|
||||
moss_sft = moss_sft.select(range(num))
|
||||
print(
|
||||
f"Load successfully, total {len(moss_sft)} samples.")
|
||||
print(f"Load successfully, total {len(moss_sft)} samples.")
|
||||
|
||||
return moss_sft
|
||||
|
||||
|
||||
def get_dataset(tokenizer, save_dir, max_len=1024, num=-1, test_size=0.1):
|
||||
moss_sft_data = load_data(save_dir, tokenizer, max_len, num)
|
||||
moss_sft_split = moss_sft_data.train_test_split(test_size=test_size)
|
||||
|
@ -102,4 +108,3 @@ def get_dataset(tokenizer, save_dir, max_len=1024, num=-1, test_size=0.1):
|
|||
val_dataset = SFTDataset(moss_sft_split["test"])
|
||||
|
||||
return train_dataset, val_dataset
|
||||
|
||||
|
|
|
@ -19,26 +19,35 @@
|
|||
# limitations under the License.
|
||||
""" PyTorch InternLM model."""
|
||||
import math
|
||||
import queue
|
||||
import threading
|
||||
from typing import List, Optional, Tuple, Union
|
||||
import threading, queue
|
||||
|
||||
import torch
|
||||
import torch.utils.checkpoint
|
||||
from configuration_internlm import InternLMConfig
|
||||
from torch import nn
|
||||
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
|
||||
|
||||
from transformers.activations import ACT2FN
|
||||
from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast, SequenceClassifierOutputWithPast
|
||||
from transformers.modeling_utils import PreTrainedModel
|
||||
from transformers.generation.streamers import BaseStreamer
|
||||
from transformers.utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings
|
||||
from configuration_internlm import InternLMConfig
|
||||
|
||||
from transformers.modeling_outputs import (
|
||||
BaseModelOutputWithPast,
|
||||
CausalLMOutputWithPast,
|
||||
SequenceClassifierOutputWithPast,
|
||||
)
|
||||
from transformers.modeling_utils import PreTrainedModel
|
||||
from transformers.utils import (
|
||||
add_start_docstrings,
|
||||
add_start_docstrings_to_model_forward,
|
||||
logging,
|
||||
replace_return_docstrings,
|
||||
)
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
_CONFIG_FOR_DOC = "InternLMConfig"
|
||||
|
||||
|
||||
# Copied from transformers.models.bart.modeling_bart._make_causal_mask
|
||||
def _make_causal_mask(
|
||||
input_ids_shape: torch.Size, dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0
|
||||
|
@ -423,7 +432,7 @@ INTERNLM_INPUTS_DOCSTRING = r"""
|
|||
more detail.
|
||||
return_dict (`bool`, *optional*):
|
||||
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
|
||||
"""
|
||||
""" # noqa: E501
|
||||
|
||||
|
||||
@add_start_docstrings(
|
||||
|
@ -437,6 +446,7 @@ class InternLMModel(InternLMPreTrainedModel):
|
|||
Args:
|
||||
config: InternLMConfig
|
||||
"""
|
||||
|
||||
_auto_class = "AutoModel"
|
||||
|
||||
def __init__(self, config: InternLMConfig):
|
||||
|
@ -776,7 +786,8 @@ class InternLMForCausalLM(InternLMPreTrainedModel):
|
|||
return tokenizer([prompt], return_tensors="pt")
|
||||
|
||||
@torch.no_grad()
|
||||
def chat(self,
|
||||
def chat(
|
||||
self,
|
||||
tokenizer,
|
||||
query: str,
|
||||
history: List[Tuple[str, str]] = [],
|
||||
|
@ -785,16 +796,19 @@ class InternLMForCausalLM(InternLMPreTrainedModel):
|
|||
do_sample: bool = True,
|
||||
temperature: float = 0.8,
|
||||
top_p: float = 0.8,
|
||||
**kwargs):
|
||||
**kwargs,
|
||||
):
|
||||
inputs = self.build_inputs(tokenizer, query, history)
|
||||
inputs = {k: v.to(self.device) for k, v in inputs.items() if torch.is_tensor(v)}
|
||||
outputs = self.generate(**inputs,
|
||||
outputs = self.generate(
|
||||
**inputs,
|
||||
streamer=streamer,
|
||||
max_new_tokens=max_new_tokens,
|
||||
do_sample=do_sample,
|
||||
temperature=temperature,
|
||||
top_p=top_p,
|
||||
**kwargs)
|
||||
**kwargs,
|
||||
)
|
||||
outputs = outputs[0].cpu().tolist()[len(inputs["input_ids"][0]) :]
|
||||
response = tokenizer.decode(outputs, skip_special_tokens=True)
|
||||
response = response.split("<eoa>")[0]
|
||||
|
@ -802,7 +816,8 @@ class InternLMForCausalLM(InternLMPreTrainedModel):
|
|||
return response, history
|
||||
|
||||
@torch.no_grad()
|
||||
def stream_chat(self,
|
||||
def stream_chat(
|
||||
self,
|
||||
tokenizer,
|
||||
query: str,
|
||||
history: List[Tuple[str, str]] = [],
|
||||
|
@ -810,7 +825,8 @@ class InternLMForCausalLM(InternLMPreTrainedModel):
|
|||
do_sample: bool = True,
|
||||
temperature: float = 0.8,
|
||||
top_p: float = 0.8,
|
||||
**kwargs):
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Return a generator in format: (response, history)
|
||||
Eg.
|
||||
|
@ -861,7 +877,7 @@ class InternLMForCausalLM(InternLMPreTrainedModel):
|
|||
do_sample=do_sample,
|
||||
temperature=temperature,
|
||||
top_p=top_p,
|
||||
**kwargs
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def consumer():
|
||||
|
|
|
@ -24,11 +24,9 @@ from shutil import copyfile
|
|||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
import sentencepiece as spm
|
||||
|
||||
from transformers.tokenization_utils import PreTrainedTokenizer
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
VOCAB_FILES_NAMES = {"vocab_file": "./tokenizer.model"}
|
||||
|
|
Loading…
Reference in New Issue