add math results

2024-01-17 11:04:44 +08:00 · 2024-01-17 11:04:44 +08:00 · 5e89047424
parent 3bda9f73ad
commit 5e89047424
2 changed files with 14 additions and 0 deletions
--- a/agent/README.md
+++ b/agent/README.md
@ -8,6 +8,13 @@ On August 22, 2023, the Shanghai Artificial Intelligence Laboratory open-sourced

 InternLM2-Chat, open-sourced on January 17, 2024, further enhances its capabilities in code interpretation and general tool invocation. With improved and more generalized instruction understanding, tool selection, and result reflection, the new model can more reliably support the construction of complex intelligent agents. It facilitates effective multi-round invocation of tools and accomplishes more intricate tasks. The model exhibits decent computational and reasoning abilities even without external tools, surpassing ChatGPT in mathematical performance. When combined with a code interpreter, InternLM2-Chat-20B achieves a level comparable to GPT-4 on GSM8K and MATH. Leveraging strong foundational capabilities in mathematics and tools, InternLM2-Chat provides practical data analysis capabilities.

+|       | GSM8K | MATH |
+| :---: | :---: | :--: |
+| InternLM2-Chat-20B | 79.6 | 32.5 |
+| InternLM2-Chat-20B with Code Interpreter  | 84.5 | 51.2 |
+| ChatGPT (GPT-3.5) | 78.2 | 28.0 |
+| GPT-4 | 91.4 | 45.8 |
+
 ## Experience

 We offer examples using [Lagent](lagent.md) to build intelligent agents based on InternLM2-Chat, calling code interpreters or searching tools. Additionally, we provide a sample using [PAL to evaluate GSM8K math problems](pal_inference.md) with InternLM-Chat-7B.
--- a/agent/README_zh-CN.md
+++ b/agent/README_zh-CN.md
@ -8,6 +8,13 @@

 2024 年 1 月 17 日开源的 InternLM2-Chat 进一步提高了在代码解释和通用工具调用方面的能力。基于更强和更具有泛化性的指令理解、工具筛选与结果反思等能力，新版模型可以更可靠地支持复杂智能体的搭建，支持对工具进行有效的多轮调用，完成较复杂的任务。模型在不使用外部工具的条件下已具备不错的计算能力和推理能力，数理表现超过 ChatGPT；在配合代码解释器（code-interpreter）的条件下，InternLM2-Chat-20B 在 GSM8K 和 MATH 上可以达到和 GPT-4 相仿的水平。基于在数理和工具方面强大的基础能力，InternLM2-Chat 提供了实用的数据分析能力。

+|       | GSM8K | MATH |
+| :---: | :---: | :--: |
+| InternLM2-Chat-20B 单纯依靠内在能力 | 79.6 | 32.5 |
+| InternLM2-Chat-20B 配合代码解释器  | 84.5 | 51.2 |
+| ChatGPT (GPT-3.5) | 78.2 | 28.0 |
+| GPT-4 | 91.4 | 45.8 |
+
 ## 体验

 我们提供了使用 [Lagent](lagent_zh_cn.md) 来基于 InternLM2-Chat 构建智能体调用代码解释器或者搜索等工具的例子。同时，我们也提供了采用 [PAL 评测 GSM8K 数学题](pal_inference_zh-CN.md) InternLM-Chat-7B 的样例。