Update performance

2024-01-17 11:08:31 +08:00 · 2024-01-17 11:08:31 +08:00 · e2aa1a332c
parent 7767629116
commit e2aa1a332c
2 changed files with 2 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -213,7 +213,7 @@ We utilize [OpenCompass](https://github.com/open-compass/opencompass) for model

 ### Objective Evaluation

-To evaluate the InternLM model, please follow the guidelines in the [OpenCompass tutorial](https://github.com/open-compass/opencompass). Typically, we use `ppl` for multiple-choice questions on the **Base** model and `gen` for all questions on the **Chat** model.
+To evaluate the InternLM model, please follow the guidelines in the [OpenCompass tutorial](https://opencompass.readthedocs.io/en/latest/get_started/installation.html). Typically, we use `ppl` for multiple-choice questions on the **Base** model and `gen` for all questions on the **Chat** model.

 ### Long-Context Evaluation (Needle in a Haystack)

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -201,14 +201,13 @@ print(response)

 **注意：**本项目中的全量训练功能已经迁移到了[InternEvo](https://github.com/InternLM/InternEvo)以便捷用户的使用。InternEvo 提供了高效的预训练和微调基建用于训练 InternLM 系列模型。

-
 ## 评测

 我们使用 [OpenCompass](https://github.com/open-compass/opencompass) 进行模型评估。在 InternLM-2 中，我们主要标准客观评估、长文评估（大海捞针）、数据污染评估、智能体评估和主观评估。

 ### 标准客观评测

-请按照 [OpenCompass 教程](https://github.com/open-compass/opencompass) 进行客观评测。我们通常在 **Base** 模型上使用 `ppl` 进行多项选择题，在 **Chat** 模型上使用 `gen` 进行所有问题。
+请按照 [OpenCompass 教程](https://opencompass.readthedocs.io/zh-cn/latest/get_started/installation.html) 进行客观评测。我们通常在 Base 模型上使用 ppl 进行多项选择题评测，在 Chat 模型上使用 gen 进行所有问题的答案生成和评测。

 ### 长文评估（大海捞针）