From dfc6b346a4f65ecc1c3d635478774f442c9b967c Mon Sep 17 00:00:00 2001 From: RangiLyu Date: Wed, 3 Jul 2024 20:27:12 +0800 Subject: [PATCH] update readme --- README_zh-CN.md | 16 ++++++++-------- model_cards/internlm2.5_7b.md | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/README_zh-CN.md b/README_zh-CN.md index 956df56..c1815f8 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -120,14 +120,14 @@ InternLM2.5 系列模型在本仓库正式发布,具有如下特性: ### 对话模型 -| Benchmark | InternLM2-Chat-7B | LLaMA-3-8B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen2-7B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | -| ----------------- | ----------------- | ------------------- | -------------- | ------------- | ----------------- | ------------ | ------------------- | -| MMLU(5-shot) | 62.3 | 68.4 | 71.0 | 71.4 | 70.8 | 70.9 | 72.8 | -| CMMLU(5-shot) | 62.4 | 53.3 | 74.5 | 74.5 | 80.9 | 60.3 | 78.0 | -| BBH(3-shot CoT) | 59.0 | 54.4 | 69.6 | 69.6 | 65.0 | 68.2\* | 71.6 | -| MATH(0-shot CoT) | 27.6 | 27.9 | 51.1 | 51.1 | 48.6 | 46.9 | 60.1 | -| GSM8K(0-shot CoT) | 72.5 | 72.9 | 80.1 | 85.3 | 82.9 | 88.9 | 86.0 | -| GPQA(0-shot) | 29.8 | 26.1 | 37.9 | 36.9 | 38.4 | 33.8 | 38.4 | +| Benchmark | InternLM2-Chat-7B | LLaMA-3-8B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen2-7B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | Llama-3-70B-Instruct | +| ----------------- | ----------------- | ------------------- | -------------- | ------------- | ----------------- | ------------ | ------------------- | -------------------- | +| MMLU(5-shot) | 62.3 | 68.4 | 71.0 | 71.4 | 70.8 | 70.9 | 72.8 | 80.5 | +| CMMLU(5-shot) | 62.4 | 53.3 | 74.5 | 74.5 | 80.9 | 60.3 | 78.0 | 70.1 | +| BBH(3-shot CoT) | 59.0 | 54.4 | 69.6 | 69.6 | 65.0 | 68.2\* | 71.6 | 80.5 | +| MATH(0-shot CoT) | 27.6 | 27.9 | 51.1 | 51.1 | 48.6 | 46.9 | 60.1 | 47.1 | +| GSM8K(0-shot CoT) | 72.5 | 72.9 | 80.1 | 85.3 | 82.9 | 88.9 | 86.0 | 92.8 | +| GPQA(0-shot) | 29.8 | 26.1 | 37.9 | 36.9 | 38.4 | 33.8 | 38.4 | 38.9 | - 我们使用 `ppl` 对基座模型进行 MCQ 指标的评测。 - 评测结果来自 [OpenCompass](https://github.com/open-compass/opencompass) ,评测配置可以在 [OpenCompass](https://github.com/open-compass/opencompass) 提供的配置文件中找到。 diff --git a/model_cards/internlm2.5_7b.md b/model_cards/internlm2.5_7b.md index 35ddea6..e569f19 100644 --- a/model_cards/internlm2.5_7b.md +++ b/model_cards/internlm2.5_7b.md @@ -11,7 +11,7 @@ InternLM2.5, the 2.5th generation InternLM, has open-sourced a 7 billion paramet The model has the following characteristics: - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B. -- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with LMDeploy for 1M-context inference. +- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](./chat/lmdeploy.md) for 1M-context inference. More details and a file chat demo are found [here](./long_context/README.md). - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in Lagent soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://huggingface.co/internlm/internlm2_5-7b-chat-1m/blob/main/agent/). ## Model Zoo