From ab9ae22031a280816e3d852b2fe5f5b2d394d308 Mon Sep 17 00:00:00 2001
From: Shuo Zhang <zhangshuo@pjlab.org.cn>
Date: Wed, 21 Feb 2024 17:54:50 +0800
Subject: [PATCH] fix Performance Evaluation if internlm2-1.8b

---
 README.md                     |  2 +-
 model_cards/internlm2_1.8b.md | 14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 8078482..a3647e3 100644
--- a/README.md
+++ b/README.md
@@ -47,7 +47,7 @@ InternLM2 series are released with the following features:
 
 ## News
 
-\[2024.01.31\] We release InternLM2-1.8B, along with the associated chat model. This model provides a cheaper deployment option while maintaining leading performance.
+\[2024.01.31\] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.
 
 \[2024.01.23\] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See [InternLM-Math](https://github.com/InternLM/internlm-math) for details and download.
 
diff --git a/model_cards/internlm2_1.8b.md b/model_cards/internlm2_1.8b.md
index 98eafa1..6945024 100644
--- a/model_cards/internlm2_1.8b.md
+++ b/model_cards/internlm2_1.8b.md
@@ -27,13 +27,13 @@ We have evaluated InternLM2 on several important benchmarks using the open-sourc
 
 | Dataset\Models | InternLM2-1.8B | InternLM2-Chat-1.8B-SFT | InternLM2-Chat-1.8B | InternLM2-7B | InternLM2-Chat-7B |
 | :---: | :---: | :---: | :---: | :---: | :---: |
-| MMLU | 46.9 | 47.1 | 47.1 | 65.8 | 63.7 |
-| AGIEval | 33.4 | 38.8 | 38.7 | 49.9 | 47.2 |
-| BBH | 37.5 | 35.2 | 36.1 | 65.0 | 61.2 |
-| GSM8K | 31.2 | 39.7 | 40.9 | 70.8 | 70.7 |
-| MATH | 5.6 | 11.8 | 12.1 | 20.2 | 23.0 |
-| HumanEval | 25.0 | 32.9 | 34.2 | 43.3 | 59.8 |
-| MBPP(Sanitized) | 22.2 | 23.2 | 26.6 | 51.8 | 51.4 |
+| MMLU | 46.9 | 47.1 | 44.1 | 65.8 | 63.7 |
+| AGIEval | 33.4 | 38.8 | 34.6 | 49.9 | 47.2 |
+| BBH | 37.5 | 35.2 | 34.3 | 65.0 | 61.2 |
+| GSM8K | 31.2 | 39.7 | 34.3 | 70.8 | 70.7 |
+| MATH | 5.6 | 11.8 | 10.7 | 20.2 | 23.0 |
+| HumanEval | 25.0 | 32.9 | 29.3 | 43.3 | 59.8 |
+| MBPP(Sanitized) | 22.2 | 23.2 | 27.0 | 51.8 | 51.4 |
 
 
 - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).