InternLM/model_cards/internlm2_20b.md

5.1 KiB

InternLM2-20B Model Card

Introduction

The second generation of the InternLM model, InternLM2, includes models at two scales: 7B and 20B. For the convenience of users and researchers, we have open-sourced four versions of each scale of the model, which are:

  • internlm2-base-20b: A high-quality and highly adaptable model base, serving as an excellent starting point for deep domain adaptation.
  • internlm2-20b (recommended): Built upon the internlm2-base, this version has been enhanced in multiple capability directions. It shows outstanding performance in evaluations while maintaining robust general language abilities, making it our recommended choice for most applications.
  • internlm2-chat-20b-sft: Based on the Base model, it undergoes supervised human alignment training.
  • internlm2-chat-20b (recommended): Optimized for conversational interaction on top of the internlm2-sft through RLHF, it excels in instruction adherence, empathetic chatting, and tool invocation.

The base model of InternLM2 has the following technical features:

  • Effective support for ultra-long contexts of up to 200,000 characters: The model nearly perfectly achieves "finding a needle in a haystack" in long inputs of 200,000 characters. It also leads among open-source models in performance on long-text tasks such as LongBench and L-Eval.
  • Comprehensive performance enhancement: Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding.

Model Zoo

Model Transformers(HF) ModelScope(HF) OpenXLab(HF) Release Date
InternLM2 Base 20B 🤗internlm/internlm2-base-20b internlm2-base-20b Open in OpenXLab 2024-01-17
InternLM2 20B 🤗internlm/internlm2-20b internlm2-20b Open in OpenXLab 2024-01-17
InternLM2 Chat 20B SFT 🤗internlm/internlm2-chat-20b-sft internlm2-chat-20b-sft Open in OpenXLab 2024-01-17
InternLM2 Chat 20B 🤗internlm/internlm2-chat-20b internlm2-chat-20b Open in OpenXLab 2024-01-17

Performance Evaluation

We have evaluated InternLM2 on several important benchmarks using the open-source evaluation tool OpenCompass. Some of the evaluation results are shown in the table below. You are welcome to visit the OpenCompass Leaderboard for more evaluation results.

Dataset\Models InternLM2-7B InternLM2-Chat-7B InternLM2-20B InternLM2-Chat-20B ChatGPT GPT-4
MMLU 65.8 63.7 67.7 66.5 69.1 83.0
AGIEval 49.9 47.2 53.0 50.3 39.9 55.1
BBH 65.0 61.2 72.1 68.3 70.1 86.7
GSM8K 70.8 70.7 76.1 79.6 78.2 91.4
MATH 20.2 23.0 25.5 31.9 28.0 45.8
HumanEval 43.3 59.8 48.8 67.1 73.2 74.4
MBPP(Sanitized) 51.8 51.4 63.0 65.8 78.9 79.0
  • The evaluation results were obtained from OpenCompass , and evaluation configuration can be found in the configuration files provided by OpenCompass.
  • The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.