update docs

pull/695/head
wangzy 2024-03-07 03:45:38 +00:00
parent 17199b8678
commit d4f9c2f49e
7 changed files with 143 additions and 145 deletions

View File

@ -242,7 +242,7 @@ To learn more about data contamination assessment, please check the [contaminati
### Agent Evaluation
- To evaluate tool utilization, please refer to [T-Eval](https://github.com/open-compass/T-Eval).
- For code interpreter evaluation, use the [gsm-8k-agent](https://github.com/open-compass/opencompass/blob/main/configs/datasets/gsm8k/gsm8k_agent_gen_be1606.py) provided in the repository. Additionally, you need to install [Lagent](https://github.com/InternLM/lagent).
- For code interpreter evaluation, use the [Math Agent Evaluation](agent/streaming_inference.py) provided in the repository. Additionally, you need to install [Lagent](https://github.com/InternLM/lagent).
### Subjective Evaluation

View File

@ -19,4 +19,69 @@ The results of InternLM2-Chat-20B on math code interpreter is as below:
## Usages
We offer examples using [Lagent](lagent.md) to build agents based on InternLM2-Chat to call code interpreter or search API. Please see [Inference with Streaming Agents in Lagent](streaming_inference.md). Additionally, we provide an example code using [PAL to evaluate GSM8K math problems](pal_inference.md) with InternLM-Chat-7B.
We offer an example using [Lagent](lagent.md) to build agents based on InternLM2-Chat to call the code interpreter. Firstly install the extra dependencies:
```bash
pip install -r requirements.txt
```
Run the following script to perform inference and evaluation on GSM8K and MATH test.
```bash
python streaming_inference.py \
--backend=lmdeploy \ # For HuggingFace models: hf
--model_path=internlm/internlm2-chat-20b \
--tp=2 \
--temperature=0.0 \
--dataset=math \
--output_path=math_lmdeploy.jsonl \
--do_eval
```
`output_path` is a jsonl format file to save the inference results. Each line is like
```json
{
"idx": 41,
"query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
"gt": "0",
"pred": ["0"],
"steps": [
{
"role": "language",
"content": ""
},
{
"role": "tool",
"content": {
"name": "IPythonInteractive",
"parameters": {
"command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n x, y = symbols('x y')\n equation = 3*x + 2*y - 12\n b = solve(equation.subs(x, 4), y)[0]\n\n return b\n\nresult = find_b()\nprint(result)\n```"
}
},
"name": "interpreter"
},
{
"role": "environment",
"content": "0",
"name": "interpreter"
},
{
"role": "language",
"content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
}
],
"error": null
}
```
Once it is prepared, just skip the inference stage as follows.
```bash
python streaming_inference.py \
--output_path=math_lmdeploy.jsonl \
--no-do_infer \
--do_eval
```
Please refer to [`streaming_inference.py`](streaming_inference.py) for more information about the arguments.

View File

@ -19,4 +19,69 @@ InternLM2-Chat 进一步提高了它在代码解释和通用工具调用方面
## 体验
我们提供了使用 [Lagent](lagent_zh-CN.md) 来基于 InternLM2-Chat 构建智能体调用代码解释器或者搜索等工具的例子,请参考 [Lagent 智能体流式推理](streaming_inference_zh-CN.md)。同时,我们也提供了采用 [PAL 评测 GSM8K 数学题](pal_inference_zh-CN.md) InternLM-Chat-7B 的样例。
我们提供了使用 [Lagent](lagent_zh-CN.md) 来基于 InternLM2-Chat 构建智能体调用代码解释器的例子。首先安装额外依赖:
```bash
pip install -r requirements.txt
```
运行以下脚本在 GSM8K 和 MATH 测试集上进行推理和评估:
```bash
python streaming_inference.py \
--backend=lmdeploy \ # For HuggingFace models: hf
--model_path=internlm/internlm2-chat-20b \
--tp=2 \
--temperature=0.0 \
--dataset=math \
--output_path=math_lmdeploy.jsonl \
--do_eval
```
`output_path` 是一个存储推理结果的 jsonl 格式文件,每行形如:
```json
{
"idx": 41,
"query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
"gt": "0",
"pred": ["0"],
"steps": [
{
"role": "language",
"content": ""
},
{
"role": "tool",
"content": {
"name": "IPythonInteractive",
"parameters": {
"command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n x, y = symbols('x y')\n equation = 3*x + 2*y - 12\n b = solve(equation.subs(x, 4), y)[0]\n\n return b\n\nresult = find_b()\nprint(result)\n```"
}
},
"name": "interpreter"
},
{
"role": "environment",
"content": "0",
"name": "interpreter"
},
{
"role": "language",
"content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
}
],
"error": null
}
```
如果已经准备好了该文件,可直接跳过推理阶段进行评估:
```bash
python streaming_inference.py \
--output_path=math_lmdeploy.jsonl \
--no-do_infer \
--do_eval
```
请参考 [`streaming_inference.py`](streaming_inference.py) 获取更多关于参数的信息。

10
agent/requirements.txt Normal file
View File

@ -0,0 +1,10 @@
lmdeploy>=0.2.2
datasets
tqdm
numpy
pebble
jsonlines
sympy==1.12
antlr4-python3-runtime==4.11.0
lagent
einops

View File

@ -1,66 +0,0 @@
# Inference with Streaming Agents in Lagent
English | [简体中文](streaming_inference_zh-CN.md)
[Lagent](https://github.com/InternLM/lagent) is strongly recommended for agent construction. It supports multiple types of agents and is integrated with commonly used tools, including code interpreters.
We provide a script for inference and evaluation on GSM8K and MATH test. The usage is as follows:
```bash
python streaming_inference.py \
--backend=lmdeploy \ # For HuggingFace models: hf
--model_path=internlm/internlm2-chat-20b \
--tp=2 \
--temperature=0.0 \
--dataset=math \
--output_path=math_lmdeploy.jsonl \
--do_eval
```
`output_path` is a jsonl format file to save the inference results. Each line is like:
```json
{
"idx": 41,
"query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
"gt": "0",
"pred": ["0"],
"steps": [
{
"role": "language",
"content": ""
},
{
"role": "tool",
"content": {
"name": "IPythonInteractive",
"parameters": {
"command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n x, y = symbols('x y')\n equation = 3*x + 2*y - 12\n b = solve(equation.subs(x, 4), y)[0]\n\n return b\n\nresult = find_b()\nprint(result)\n```"
}
},
"name": "interpreter"
},
{
"role": "environment",
"content": "0",
"name": "interpreter"
},
{
"role": "language",
"content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
}
],
"error": null
}
```
Once it is prepared, just skip the inference stage as follows:
```bash
python streaming_inference.py \
--output_path=math_lmdeploy.jsonl \
--no-do_infer \
--do_eval
```
Please refer to [`streaming_inference.py`](streaming_inference.py) for more information about the arguments.

View File

@ -1,66 +0,0 @@
# Lagent 智能体流式推理
[English](streaming_inference.md) | 简体中文
推荐使用 [Lagent](https://github.com/InternLM/lagent) 框架,该工具包实现了多种类型智能体并集成了常见工具,包括 Python 代码解释器。
我们基于此提供一个在 GSM8K 和 MATH 测试集上推理和评估的脚本,使用方式如下:
```bash
python streaming_inference.py \
--backend=lmdeploy \ # For HuggingFace models: hf
--model_path=internlm/internlm2-chat-20b \
--tp=2 \
--temperature=0.0 \
--dataset=math \
--output_path=math_lmdeploy.jsonl \
--do_eval
```
`output_path` 是一个存储推理结果的 jsonl 格式文件,每行形如:
```json
{
"idx": 41,
"query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
"gt": "0",
"pred": ["0"],
"steps": [
{
"role": "language",
"content": ""
},
{
"role": "tool",
"content": {
"name": "IPythonInteractive",
"parameters": {
"command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n x, y = symbols('x y')\n equation = 3*x + 2*y - 12\n b = solve(equation.subs(x, 4), y)[0]\n\n return b\n\nresult = find_b()\nprint(result)\n```"
}
},
"name": "interpreter"
},
{
"role": "environment",
"content": "0",
"name": "interpreter"
},
{
"role": "language",
"content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
}
],
"error": null
}
```
如果已经准备好了该文件,可直接跳过推理阶段进行评估:
```bash
python streaming_inference.py \
--output_path=math_lmdeploy.jsonl \
--no-do_infer \
--do_eval
```
请参考 [`streaming_inference.py`](streaming_inference.py) 获取更多关于参数的信息。

View File

@ -1,13 +1,3 @@
sentencepiece
streamlit
transformers>=4.34
lmdeploy>=0.2.2
datasets
tqdm
numpy
pebble
jsonlines
sympy==1.12
antlr4-python3-runtime==4.11.0
lagent
einops