InternLM/agent
BraisedPork 2b221a9f17
Support inference and evaluation with Math Code Interpreter (#695)
Co-authored-by: wangzy <wangziyi@pjlab.org.cn>
2024-03-08 14:32:33 +08:00
..
README.md Support inference and evaluation with Math Code Interpreter (#695) 2024-03-08 14:32:33 +08:00
README_zh-CN.md Support inference and evaluation with Math Code Interpreter (#695) 2024-03-08 14:32:33 +08:00
lagent.md [Docs] Update Agent docs (#590) 2024-01-17 19:37:51 +08:00
lagent_zh-CN.md [CI]: fix and pass pre-commit hook (#666) 2024-01-26 17:26:04 +08:00
pal_inference.md [CI]: fix and pass pre-commit hook (#666) 2024-01-26 17:26:04 +08:00
pal_inference.py [CI]: fix and pass pre-commit hook (#666) 2024-01-26 17:26:04 +08:00
pal_inference_zh-CN.md [CI]: fix and pass pre-commit hook (#666) 2024-01-26 17:26:04 +08:00
requirements.txt Support inference and evaluation with Math Code Interpreter (#695) 2024-03-08 14:32:33 +08:00
streaming_inference.py Support inference and evaluation with Math Code Interpreter (#695) 2024-03-08 14:32:33 +08:00

README.md

InternLM-Chat Agent

English | 简体中文

Introduction

InternLM-Chat-7B v1.1 has been released as the first open-source model with code interpreter capabilities, supporting external tools such as Python code interpreter and search engine.

InternLM2-Chat, open sourced on January 17, 2024, further enhances its capabilities in code interpreter and general tool utilization. With improved and more generalized instruction understanding, tool selection, and reflection abilities, InternLM2-Chat can more reliably support complex agents and multi-step tool calling for more intricate tasks. InternLM2-Chat exhibits decent computational and reasoning abilities even without external tools, surpassing ChatGPT in mathematical performance. When combined with a code interpreter, InternLM2-Chat-20B obtains comparable results to GPT-4 on GSM8K and MATH. Leveraging strong foundational capabilities in mathematics and tools, InternLM2-Chat provides practical data analysis capabilities.

The results of InternLM2-Chat-20B on math code interpreter is as below:

GSM8K MATH
InternLM2-Chat-20B 79.6 32.5
InternLM2-Chat-20B with Code Interpreter 84.5 51.2
ChatGPT (GPT-3.5) 78.2 28.0
GPT-4 91.4 45.8

Usages

We offer an example using Lagent to build agents based on InternLM2-Chat to call the code interpreter. Firstly install the extra dependencies:

pip install -r requirements.txt

Run the following script to perform inference and evaluation on GSM8K and MATH test.

python streaming_inference.py \
  --backend=lmdeploy \  # For HuggingFace models: hf
  --model_path=internlm/internlm2-chat-20b \
  --tp=2 \
  --temperature=0.0 \
  --dataset=math \
  --output_path=math_lmdeploy.jsonl \
  --do_eval

output_path is a jsonl format file to save the inference results. Each line is like

{
    "idx": 41, 
    "query": "The point $(a, b)$ lies on the line with the equation $3x + 2y = 12.$ When $a = 4$, what is the value of $b$?",
    "gt": "0",
    "pred": ["0"],
    "steps": [
        {
            "role": "language",
            "content": ""
        },
        {
            "role": "tool",
            "content": {
                "name": "IPythonInteractive",
                "parameters": {
                    "command": "```python\nfrom sympy import symbols, solve\n\ndef find_b():\n    x, y = symbols('x y')\n    equation = 3*x + 2*y - 12\n    b = solve(equation.subs(x, 4), y)[0]\n\n    return b\n\nresult = find_b()\nprint(result)\n```"
                }
            },
            "name": "interpreter"
        },
        {
            "role": "environment",
            "content": "0",
            "name": "interpreter"
        },
        {
            "role": "language",
            "content": "The value of $b$ when $a = 4$ is $\\boxed{0}$."
        }
    ],
    "error": null
}

Once it is prepared, just skip the inference stage as follows.

python streaming_inference.py \
  --output_path=math_lmdeploy.jsonl \
  --no-do_infer \
  --do_eval

Please refer to streaming_inference.py for more information about the arguments.