mirror of https://github.com/InternLM/InternLM
update chat model information in README
parent
3028f07cb7
commit
4187bfbfe8
|
@ -43,26 +43,24 @@ InternLM は、70 億のパラメータを持つベースモデルと、実用
|
|||
|
||||
## 新闻
|
||||
|
||||
InternLM-7B-Chat v1.1 は、コード インタプリタと関数呼び出し機能を備えてリリースされました。 [Lagent](https://github.com/InternLM/lagent) で試すことができます。
|
||||
|
||||
## InternLM-7B
|
||||
|
||||
### パフォーマンス評価
|
||||
|
||||
オープンソースの評価ツール [OpenCompass](https://github.com/internLM/OpenCompass/) を用いて、InternLM の総合的な評価を行った。この評価では、分野別能力、言語能力、知識能力、推論能力、理解能力の 5 つの次元をカバーしました。以下は評価結果の一部であり、その他の評価結果については [OpenCompass leaderboard](https://opencompass.org.cn/rank) をご覧ください。
|
||||
|
||||
| データセット\モデル | **InternLM-Chat-7B** | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| ---------------- | -------------------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
| データセット\モデル | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| ---------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
|
||||
- 評価結果は [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) (*印のあるデータは原著論文からの引用を意味する)から取得したもので、評価設定は [OpenCompass](https://github.com/internLM/OpenCompass/) が提供する設定ファイルに記載されています。
|
||||
- 評価データは、[OpenCompass](https://github.com/internLM/OpenCompass/) のバージョンアップにより数値的な差異が生じる可能性がありますので、[OpenCompass](https://github.com/internLM/OpenCompass/) の最新の評価結果をご参照ください。
|
||||
|
@ -75,7 +73,6 @@ InternLM 7B と InternLM 7B チャットは、InternLM を使って訓練され
|
|||
| ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
|
||||
| **InternLM 7B** | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | [🤗internlm/intern-7b](https://huggingface.co/internlm/internlm-7b) |
|
||||
| **InternLM Chat 7B** | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [🤗internlm/intern-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) |
|
||||
| **InternLM Chat 7B 8k** | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | [🤗internlm/intern-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) |
|
||||
|
||||
**制限事項:** 学習過程におけるモデルの安全性を確保し、倫理的・法的要件に準拠したテキストを生成するようモデルに促す努力を行ってきたが、モデルのサイズと確率的生成パラダイムのため、モデルは依然として予期せぬ出力を生成する可能性がある。例えば、生成された回答には偏見や差別、その他の有害な内容が含まれている可能性があります。そのような内容を伝播しないでください。有害な情報の伝播によって生じるいかなる結果に対しても、私たちは責任を負いません。
|
||||
|
||||
|
|
|
@ -44,8 +44,8 @@ InternLM 是一个开源的轻量级训练框架,旨在支持大模型训练
|
|||
|
||||
## 更新
|
||||
|
||||
[20231213] 我们更新了 InternLM-7B-Chat 和 InternLM-20B-Chat 模型权重。通过改进微调数据和训练策略,新版对话模型生成的回复质量更高、语言风格更加多元。
|
||||
[20230920] InternLM-20B 已发布,包括基础版和对话版。
|
||||
[20230822] InternLM-7B-Chat v1.1 已发布,增加了代码解释器和函数调用能力。您可以使用 [Lagent](https://github.com/InternLM/lagent) 进行尝试。
|
||||
|
||||
|
||||
## Model Zoo
|
||||
|
@ -54,12 +54,10 @@ InternLM 是一个开源的轻量级训练框架,旨在支持大模型训练
|
|||
|
||||
| Model | Transformers | ModelScope | OpenXLab |发布日期 |
|
||||
|---------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | 2023-09-20 |
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | 2023-12-12 |
|
||||
| **InternLM 20B** | [🤗internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b) | 2023-09-20 |
|
||||
| **InternLM Chat 7B v1.1** | [🤗internlm/internlm-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1) | 2023-08-22 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | 2023-12-12 |
|
||||
| **InternLM 7B** | [🤗internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | 2023-07-06 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | 2023-07-06 |
|
||||
| **InternLM Chat 7B 8k** | [🤗internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | 2023-07-06 |
|
||||
|
||||
|
||||
<details>
|
||||
|
@ -121,7 +119,6 @@ InternLM 20B 在模型结构上选择了深结构,InternLM-20B 的层数设定
|
|||
<summary> InternLM-7B </summary>
|
||||
|
||||
#### 模型更新
|
||||
[20230822] 通过使用更丰富的SFT类型数据,InternLM-7B-Chat v1.1模型支持代码解释和函数调用。模型结构与代码没有任何变化,因此可以使用与InternLM-7B-Chat完全一样的方式使用更强大的InternLM-7B-Chat v1.1。
|
||||
|
||||
#### 简介
|
||||
InternLM-7B 包含了一个拥有70亿参数的基础模型和一个为实际场景量身定制的对话模型。该模型具有以下特点:
|
||||
|
@ -134,18 +131,18 @@ InternLM-7B 包含了一个拥有70亿参数的基础模型和一个为实际场
|
|||
|
||||
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[OpenCompass 榜单](https://opencompass.org.cn/rank)获取更多的评测结果。
|
||||
|
||||
| 数据集\模型 | **InternLM-Chat-7B** | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| -------------------- | --------------------- | ---------------- | --------- | --------- | ------------ | --------- | ---------- |
|
||||
| C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
| 数据集\模型 | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| -------------------- | ---------------- | --------- | --------- | ------------ | --------- | ---------- |
|
||||
| C-Eval(Val) | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
|
||||
- 以上评测结果基于 [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
|
||||
- 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
|
||||
|
|
33
README.md
33
README.md
|
@ -44,8 +44,8 @@ Based on the InternLM training framework, we have released two open-sourced pret
|
|||
|
||||
## News
|
||||
|
||||
[20231213] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responces with greater stylistic diversity.
|
||||
[20230920] InternLM-20B is released with base and chat versions.
|
||||
[20230822] InternLM-7B-Chat v1.1 is released with code interpreter and function calling capability. You can try it with [Lagent](https://github.com/InternLM/lagent).
|
||||
|
||||
|
||||
## Model Zoo
|
||||
|
@ -57,12 +57,10 @@ Our models are released in three platforms: Transformers, ModelScope and OpenXLa
|
|||
|
||||
| Model | Transformers(HF) | ModelScope(HF) | OpenXLab(HF) | OpenXLab(Original) | Release Date |
|
||||
|---------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b-original) | 2023-09-20 |
|
||||
| **InternLM Chat 20B** | [🤗internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-20b-chat) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b-chat/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-20b-original) | 2023-12-12 |
|
||||
| **InternLM 20B** | [🤗internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-20b-original) | 2023-09-20 |
|
||||
| **InternLM Chat 7B v1.1** | [🤗internlm/internlm-chat-7b-v1.1](https://huggingface.co/internlm/internlm-chat-7b-v1.1) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-v1.1-original) | 2023-08-22 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-original) | 2023-12-12 |
|
||||
| **InternLM 7B** | [🤗internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-7b-original) | 2023-07-06 |
|
||||
| **InternLM Chat 7B** | [🤗internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-original) | 2023-07-06 |
|
||||
| **InternLM Chat 7B 8k** | [🤗internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) | [<img src="./doc/imgs/modelscope_logo.png" width="20px" /> Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k) | [](https://openxlab.org.cn/models/detail/OpenLMLab/InternLM-chat-7b-8k-original) | 2023-07-06 |
|
||||
|
||||
#### Introduction
|
||||
InternLM-20B was pre-trained on over **2.3T** Tokens containing high-quality English, Chinese, and code data. Additionally, the Chat version has undergone SFT and RLHF training, enabling it to better and more securely meet users' needs.
|
||||
|
@ -120,7 +118,6 @@ Overall, InternLM-20B comprehensively outperforms open-source models in the 13B
|
|||
<summary> InternLM-7B </summary>
|
||||
|
||||
#### News
|
||||
[20230822] By utilizing richer SFT-type data, the InternLM-7B-Chat v1.1 model supports code interpretation and function invocation. The model structure and code remain unchanged, so the more powerful InternLM-7B-Chat v1.1 can be used in exactly the same way as InternLM-7B-Chat.
|
||||
|
||||
#### Introduction
|
||||
InternLM-7B contains a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
|
||||
|
@ -133,18 +130,18 @@ InternLM-7B contains a 7 billion parameter base model and a chat model tailored
|
|||
|
||||
We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
|
||||
|
||||
| Datasets\Models | **InternLM-Chat-7B** | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| --------------- | -------------------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
| Datasets\Models | **InternLM-7B** | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
|
||||
| --------------- | --------------------- | -------- | ----------- | ----------- | --------- | --------- |
|
||||
| C-Eval(Val) | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
|
||||
| MMLU | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
|
||||
| AGIEval | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
|
||||
| CommonSenseQA | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
|
||||
| BUSTM | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
|
||||
| CLUEWSC | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
|
||||
| MATH | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
|
||||
| GSM8K | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
|
||||
| HumanEval | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
|
||||
| RACE(High) | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
|
||||
|
||||
- The evaluation results were obtained from [OpenCompass 20230706](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
|
||||
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
|
||||
|
|
Loading…
Reference in New Issue