54f85a6e9a
* feat(utils/writer.py): support tensorboard writer (#63) * feat(utils/writer.py): support tensorboard writer * feat(utils/writer.py): add class comment --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * [Develop] Pull Main Branch (#121) * fix/fix_submodule_err (#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (#65) * fix(tokenizer): refactor tokenizer and update usage in readme (#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (#78) * Doc:add wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * [Docs]: add Japanese README (#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (#80) * [Enchancement] add more options for issue template (#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (#23) Fix https://github.com/InternLM/InternLM/issues/50 * delete torch_dtype of README's example code (#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (#99) * Update web_demo.py (#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (#106) --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * feat(core/scheduler): support pipeline parallel (#98) * feat(utils/writer.py): support tensorboard writer * feat(utils/writer.py): add class comment * feat(core): support pipeline parallel * fix(core): fix demo running error * feat(solver/optimizer): add pp zero optimizer * fix(solver/optimizer): fix word spelling error * feat(core/scheduler): add new dir scheduler in core/ * fix(core): fix ci lint error * feat(solver/optimizer): merge pp and nopp optimizer * doc(usage.md): update usage doc * feat(core/scheduler): support post func * feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape * feat(core/scheduler): add _load_micro_batch in base scheduler * feat(core/scheduler): support optimizer overlap communication in pp scheduler * feat(core/scheduler): delete data process func code * feat(core/trainer): schedule pre processing for all schedule --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> * refactor(rotaryEmbedding): refactor forward (#120) * use fp16 in instruction (#80) * delete torch_dtype of README's example code (#100) * refactor the forward for rotary embedding --------- Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> * feat(model/metrics.py): support calculating accuracy and perplexity m… (#91) * feat(model/metrics.py): support calculating accuracy and perplexity metrics * fix(model/metrics.py): fix import error * feat(train.py): minor update --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> * fix(optimizer/util.py) change inf defination * [Dev] Pull Main (#139) * fix/fix_submodule_err (#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (#65) * fix(tokenizer): refactor tokenizer and update usage in readme (#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (#78) * Doc:add wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * [Docs]: add Japanese README (#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (#80) * [Enchancement] add more options for issue template (#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (#23) Fix https://github.com/InternLM/InternLM/issues/50 * delete torch_dtype of README's example code (#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (#99) * Update web_demo.py (#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (#106) * docs(install.md): update dependency package transformers version to >= 4.28.0 (#124) Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * docs(LICENSE): add license (#125) * add license of colossalai and flash-attn * fix lint * modify the name * fix AutoModel map in convert2hf.py (#116) * variables are not printly as expect (#114) * feat(solver): fix code to adapt to torch2.0 and provide docker images (#128) * feat(solver): fix code to adapt to torch2.0 * docs(install.md): publish internlm environment image * docs(install.md): update dependency packages version * docs(install.md): update default image --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * add demo test (#132) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * fix web_demo cache accelerate (#133) * fix(hybrid_zero_optim.py): delete math import * Update embedding.py --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> Co-authored-by: huangting4201 <1538303371@qq.com> Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com> Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com> * style(solver/optimizer/utils.py): fix lint error (#147) Co-authored-by: huangting.p <huangting@sensetime.com> * feat(*): support not-flash-attn for pp and no-pp (#145) * support not flash attention for no-pp * support pipeline * modify the config * refactor the code * refactor the code * remove some unnecessary code * fix(initialize/launch.py): set default value for use_flash_attn (#158) * add default for use_flash_attn * fix lint * feat(utils/logger.py): support uniscale logger (#152) * style(internlm): fix lint error * feat(utils/logger.py): support uniscale logger * fix(utils/logger.py): fix import circular error * feat(train.py): support dashboard metric panel and fix ci train config * fix(ci_scripts/train/slurm_train.sh): fix ci train error * fix(ci_scripts/train/torchrun.sh): fix ci train error * fix(ci_scripts/train): restore ci update * fix(config.json): delete alert webhook * feat(train.py): optimize func init logger * feat(config.json): delete config.json --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> * feat(utils/evaluation.py): support evaluate (#154) * style(internlm): fix lint error * feat(utils/logger.py): support uniscale logger * fix(utils/logger.py): fix import circular error * feat(train.py): support dashboard metric panel and fix ci train config * fix(ci_scripts/train/slurm_train.sh): fix ci train error * fix(ci_scripts/train/torchrun.sh): fix ci train error * feat(utils/evaluation.py): support evaluate on validation dataset * fix(utils/evaluation.py): fix demo error * fix(ci_scripts/train/ci_7B_sft.py): fix ci train error * feat(initialize/launch.py): set default value for valid_bsz and valid_every * fix(ci_scripts/train): restore ci update * docs(configs/7B_sft.py): update comment for config * fix(config.json): delete config.json * fix evaluation bug in scheduler when use_flash_attn=False * feat(scheduler/no_pipeline_scheduler.py): support micro_bsz>1 in no pp * modify the jugement in pp and no-pp scheduler * modify the data_process_func in evaluation * fix bugs when use_flash_attn=False * rename symbol * feat(configs/7B_sft.py): change para valid_bsz to valid_micro_num * feat(scheduler/no_pipeline_scheduler.py): update para set _grad_accum_batch_size --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> Co-authored-by: yingtongxiong <974106207@qq.com> * feat(*): support no apex (#166) * support no-apex * add default for use_apex * fix lint * modify the RMSNormTorch * remove some comments * remove use_apex parameter * remove some unnecessary code * refactor(*): refactor the code with no-apex (#170) * support no-apex * add default for use_apex * fix lint * modify the RMSNormTorch * remove some comments * remove use_apex parameter * remove some unnecessary code * optimize the code including import * remove the import RMSNorm * remove warnings * refactor(scheduler): rewrite pipeline scheduler (#138) * refactor(scheduler): rewrite pipeline scheduler * fix(*): fix pipeline scheduler bugs * fix(*): fix merge bug * feat(*): update codes with todo tag * feat(*): add comments * feat(internlm/core/scheduler): update recv_prev/next logic * feat(utils/evaluation.py): update sche metric hook for valid --------- Co-authored-by: huangting.p <huangting@sensetime.com> * feat(*): support fp32 training (#155) * support float32 training * fix lint * add adaptation in model/utils.py * remove some unnecessary code * fix lint * feat(optim): add support for fp32 zero * Revert "Merge pull request #2 from SolenoidWGT/fp32_zero" This reverts commit |
||
---|---|---|
.github | ||
ci_scripts | ||
configs | ||
doc | ||
internlm | ||
requirements | ||
third_party | ||
tools | ||
.gitignore | ||
.gitmodules | ||
.owners.yml | ||
.pre-commit-config.yaml | ||
.pylintrc | ||
CHANGE_LOG.md | ||
LICENSE | ||
README-ja-JP.md | ||
README-zh-Hans.md | ||
README.md | ||
sonar-project.properties | ||
train.py | ||
version.txt | ||
web_demo.py |
README.md
InternLM
📘Usage | 🛠️Installation | 📊Train Performance | 👀Model | 🤗HuggingFace | 🆕Update News | 🤔Reporting Issues
👋 join us on Twitter, Discord and WeChat
Introduction
InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
- It leverages trillions of high-quality tokens for training to establish a powerful knowledge base.
- It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities.
- It provides a versatile toolset for users to flexibly build their own workflows.
Additionally, a lightweight training framework is offered to support model pre-training without the need for extensive dependencies. With a single codebase, it supports pre-training on large-scale clusters with thousands of GPUs, and fine-tuning on a single GPU while achieving remarkable performance optimizations. InternLM achieves nearly 90% acceleration efficiency during training on 1024 GPUs.
News
InternLM-7B-Chat v1.1 is released with code interpreter and function calling capability. You can try it with Lagent.
InternLM-7B
Performance Evaluation
We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool OpenCompass. The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the OpenCompass leaderboard for more evaluation results.
Datasets\Models | InternLM-Chat-7B | InternLM-7B | LLaMA-7B | Baichuan-7B | ChatGLM2-6B | Alpaca-7B | Vicuna-7B |
---|---|---|---|---|---|---|---|
C-Eval(Val) | 53.2 | 53.4 | 24.2 | 42.7 | 50.9 | 28.9 | 31.2 |
MMLU | 50.8 | 51.0 | 35.2* | 41.5 | 46.0 | 39.7 | 47.3 |
AGIEval | 42.5 | 37.6 | 20.8 | 24.6 | 39.0 | 24.1 | 26.4 |
CommonSenseQA | 75.2 | 59.5 | 65.0 | 58.8 | 60.0 | 68.7 | 66.7 |
BUSTM | 74.3 | 50.6 | 48.5 | 51.3 | 55.0 | 48.8 | 62.5 |
CLUEWSC | 78.6 | 59.1 | 50.3 | 52.8 | 59.8 | 50.3 | 52.2 |
MATH | 6.4 | 7.1 | 2.8 | 3.0 | 6.6 | 2.2 | 2.8 |
GSM8K | 34.5 | 31.2 | 10.1 | 9.7 | 29.2 | 6.0 | 15.3 |
HumanEval | 14.0 | 10.4 | 14.0 | 9.2 | 9.2 | 9.2 | 11.0 |
RACE(High) | 76.3 | 57.4 | 46.9* | 28.1 | 66.3 | 40.7 | 54.0 |
- The evaluation results were obtained from OpenCompass 20230706 (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by OpenCompass.
- The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.
Model Zoo
InternLM 7B and InternLM 7B Chat, trained using InternLM, have been open-sourced. We provide two formats of model weights for use. In addition to loading the models using the Transformers format, you can also load the weights directly using InternLM for further pre-training or human preference alignment training.
Model | InternLM Format Weight Download Link | Transformers Format Weight Download Link |
---|---|---|
InternLM 7B | 🤗internlm/intern-7b | |
InternLM Chat 7B v1.1 | 🤗internlm/intern-chat-7b-v1.1 | |
InternLM Chat 7B | 🤗internlm/intern-chat-7b | |
InternLM Chat 7B 8k | 🤗internlm/intern-chat-7b-8k |
Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
Import from Transformers
To load the InternLM 7B Chat model using Transformers, use the following code:
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b-v1_1", trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b-v1_1", trust_remote_code=True).cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "hello", history=[])
>>> print(response)
Hello! How can I help you today?
>>> response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
>>> print(response)
Sure, here are three tips for effective time management:
1. Prioritize tasks based on importance and urgency: Make a list of all your tasks and categorize them into "important and urgent," "important but not urgent," and "not important but urgent." Focus on completing the tasks in the first category before moving on to the others.
2. Use a calendar or planner: Write down deadlines and appointments in a calendar or planner so you don't forget them. This will also help you schedule your time more effectively and avoid overbooking yourself.
3. Minimize distractions: Try to eliminate any potential distractions when working on important tasks. Turn off notifications on your phone, close unnecessary tabs on your computer, and find a quiet place to work if possible.
Remember, good time management skills take practice and patience. Start with small steps and gradually incorporate these habits into your daily routine.
Dialogue
You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:
pip install streamlit==1.24.0
pip install transformers==4.30.2
streamlit run web_demo.py
The effect is as follows
Deployment
We use LMDeploy to complete the workflow of InternLM deployment.
python3 -m pip install lmdeploy
You can utilize the following commands to conduct internlm-chat-7b
FP16 inference, serve it and interact with AI assistant via WebUI:
# convert weight layout
python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b
# inference lmdeploy's turbomind engine
python3 -m lmdeploy.turbomind.chat ./workspace
# serving with gradio
python3 -m lmdeploy.serve.gradio.app ./workspace
You can also deploy 4-bit quantized internlm-chat-7b
model via LMDeploy. It greatly trims down the model's memory overhead to 6G, just 40% of what FP16 inference would take. More importantly, with extreme optimized kernel, the inference performance achieves 2.4x faster than FP16 inference on A100-80G.
Try the followings to enjoy 4-bit internlm-chat-7b
on a Geforce RTX 30x GPU card. You can find the inference benchmark from here.
# download prequnantized internlm-chat-7b model from huggingface
git-lfs install
git clone https://huggingface.co/lmdeploy/llama2-chat-7b-w4
# Convert the model's layout and store it in the default path, ./workspace.
python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b ./llama2-chat-7b-w4 awq --group-size 128
# inference lmdeploy's turbomind engine
python3 -m lmdeploy.turbomind.chat ./workspace
# serving with gradio
python3 -m lmdeploy.serve.gradio.app ./workspace
LMDeploy is an efficient toolkit for compressing, deploying, and serving LLM models. Please refer to the deployment tutorial for more details on deploying InternLM.
Fine-tuning & Training
Pre-training and Fine-tuning Tutorial
Please refer to Usage Tutorial to start InternLM installation, data processing, pre-training and fine-tuning.
Convert to Transformers Format
The model trained by InternLM can be easily converted to HuggingFace Transformers format, which is convenient for seamless docking with various open source projects in the community. With the help of tools/transformers/convert2hf.py
, the weights saved during training can be converted into transformers format with one command
python tools/transformers/convert2hf.py --src_folder origin_ckpt/ --tgt_folder hf_ckpt/ --tokenizer ./tools/V7_sft.model
After conversion, it can be loaded as transformers by the following code
>>> from transformers import AutoTokenizer, AutoModel
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True).cuda()
Training System
System Architecture
Please refer to the System Architecture document for further details.
Training Performance
InternLM deeply integrates Flash-Attention, Apex and other high-performance model operators to improve training efficiency. By building the Hybrid Zero technique, it achieves efficient overlap of computation and communication, significantly reducing cross-node communication traffic during training. InternLM supports expanding the 7B model from 8 GPUs to 1024 GPUs, with an acceleration efficiency of up to 90% at the thousand-GPU scale, a training throughput of over 180 TFLOPS, and an average of over 3600 tokens per GPU per second. The following table shows InternLM's scalability test data at different configurations:
GPU Number | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 |
---|---|---|---|---|---|---|---|---|
TGS | 4078 | 3939 | 3919 | 3944 | 3928 | 3920 | 3835 | 3625 |
TFLOPS | 193 | 191 | 188 | 188 | 187 | 185 | 186 | 184 |
TGS represents the average number of tokens processed per GPU per second. For more performance test data, please refer to the Training Performance document for further details.
Contribution
We appreciate all the contributors for their efforts to improve and enhance InternLM. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.
Acknowledgements
InternLM codebase is an open-source project contributed by Shanghai AI Laboratory and researchers from different universities and companies. We would like to thank all the contributors for their support in adding new features to the project and the users for providing valuable feedback. We hope that this toolkit and benchmark can provide the community with flexible and efficient code tools for fine-tuning InternLM and developing their own models, thus continuously contributing to the open-source community. Special thanks to the two open-source projects, flash-attention and ColossalAI.
License
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.
Citation
@misc{2023internlm,
title={InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities},
author={InternLM Team},
howpublished = {\url{https://github.com/InternLM/InternLM}},
year={2023}
}