fix Performance Evaluation if internlm2-1.8b

pull/703/head
Shuo Zhang 2024-02-21 17:54:50 +08:00
parent 33dae04941
commit ab9ae22031
2 changed files with 8 additions and 8 deletions

View File

@ -47,7 +47,7 @@ InternLM2 series are released with the following features:
## News
\[2024.01.31\] We release InternLM2-1.8B, along with the associated chat model. This model provides a cheaper deployment option while maintaining leading performance.
\[2024.01.31\] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.
\[2024.01.23\] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See [InternLM-Math](https://github.com/InternLM/internlm-math) for details and download.

View File

@ -27,13 +27,13 @@ We have evaluated InternLM2 on several important benchmarks using the open-sourc
| Dataset\Models | InternLM2-1.8B | InternLM2-Chat-1.8B-SFT | InternLM2-Chat-1.8B | InternLM2-7B | InternLM2-Chat-7B |
| :---: | :---: | :---: | :---: | :---: | :---: |
| MMLU | 46.9 | 47.1 | 47.1 | 65.8 | 63.7 |
| AGIEval | 33.4 | 38.8 | 38.7 | 49.9 | 47.2 |
| BBH | 37.5 | 35.2 | 36.1 | 65.0 | 61.2 |
| GSM8K | 31.2 | 39.7 | 40.9 | 70.8 | 70.7 |
| MATH | 5.6 | 11.8 | 12.1 | 20.2 | 23.0 |
| HumanEval | 25.0 | 32.9 | 34.2 | 43.3 | 59.8 |
| MBPP(Sanitized) | 22.2 | 23.2 | 26.6 | 51.8 | 51.4 |
| MMLU | 46.9 | 47.1 | 44.1 | 65.8 | 63.7 |
| AGIEval | 33.4 | 38.8 | 34.6 | 49.9 | 47.2 |
| BBH | 37.5 | 35.2 | 34.3 | 65.0 | 61.2 |
| GSM8K | 31.2 | 39.7 | 34.3 | 70.8 | 70.7 |
| MATH | 5.6 | 11.8 | 10.7 | 20.2 | 23.0 |
| HumanEval | 25.0 | 32.9 | 29.3 | 43.3 | 59.8 |
| MBPP(Sanitized) | 22.2 | 23.2 | 27.0 | 51.8 | 51.4 |
- The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).