InternLM/doc/usage.md

203 lines
13 KiB
Markdown
Raw Normal View History

2023-07-06 04:55:23 +00:00
## 基于InternLM的预训练与微调使用教程
启动一个 Demo 模型训练,需要进行三项准备,**安装****数据集准备**和**模型训练配置**。接下来,首先会介绍数据准备相关的操作,再简要描述模型训练配置相关的内容。
### 安装
请参考[安装文档](./install.md)进行安装。
### 数据准备 (预训练)
InternLM训练任务的数据集包括一系列的`bin`和`meta`文件。使用`tokenizer`从原始文本文件生成训练用数据集。通过在`tools/tokenizer.py`中指定模型参数路径的方式来导入tokenizer模型。目前提供`V7_sft.model`来生成tokens。若想使用不同的模型可直接修改`tokernizer.py`中的模型参数路径。
2023-07-06 04:55:23 +00:00
可以运行以下命令生成原始数据对应的`bin`和`meta`文件,其中参数`text_input_path`表示原始文本数据路径,目前支持`txt`、`json`和`jsonl`三种输入格式,`bin_output_path`表示生成的`bin`文件的保存路径。
2023-07-06 04:55:23 +00:00
```bash
$ python tools/tokenizer.py --text_input_path your_input_text_path --bin_output_path your_output_bin_path
2023-07-06 04:55:23 +00:00
```
下面是一个数据处理的例子:
2023-07-06 04:55:23 +00:00
给定一个包含原始数据集的文件`raw_data.txt`,原始数据集如下所示:
```bash
感恩生活中的每一个细节,才能真正体会到幸福的滋味。
梦想是人生的动力源泉,努力追逐,才能实现自己的目标。
学会宽容和理解,才能建立真正和谐的人际关系。
```
可以通过运行以下命令来生成`bin`和`meta`文件:
```bash
$ python tools/tokenizer.py --text_input_path raw_data.txt --bin_output_path cn/output.bin
2023-07-06 04:55:23 +00:00
```
需要注意的是,生成的`bin`文件需要保存在`cn`或者`en`或者`code`或者`ja`或者`ar`或者`kaoshi`这六个目录下,以区分数据集的类型。
其中,`cn`表示中文数据集;`en`表示英文数据集;`code`表示代码数据集;`ja`表示日语数据集;`ar`表示阿拉伯语数据集;`kaoshi`表示考试数据集。
生成的bin文件的格式如下
```python
{"tokens": [73075, 75302, 69522, 69022, 98899, 67713, 68015, 81269, 74637, 75445, 99157]}
{"tokens": [69469, 60355, 73026, 68524, 60846, 61844, 98899, 67775, 79241, 98899, 67713, 67800, 67453, 67838, 99157]}
{"tokens": [68057, 79017, 60378, 68014, 98899, 67713, 67990, 68015, 70381, 67428, 61003, 67622, 99157]}
```
`bin`文件中的每一行均对应原始数据集中的每一个句子,表示每个句子的`token`下文将用sequence指定
生成的`meta`文件的格式如下:
```bash
(0, 11), (90, 15), (208, 13)
```
在`meta`文件中,每个元组对应着`bin`文件中每一个`sequence`的元信息。其中,元组的第一个元素表示每个`sequence`在所有`sequence`中的`starting index`,第二个元素表示每个`sequence`中有多少个`tokens`。
例如,对于第一个`sequence``starting index`为 0有 11 个`tokens`;对于第二个`sequence`,由于第一个`sequence`转换为`string`后的长度为`89`,因此它的`starting index`为 90有 15 个`tokens`。
`json`和`jsonl`类型的文件的`bin`和`meta`文件格式和`txt`一致,此处不再赘叙。
### 数据准备 (微调)
微调任务的数据集格式与预训练任务保持一致,生成的数据格式为一系列的`bin`和`meta`文件。以下以 Alpaca 数据集为例,介绍微调的数据准备流程。
1. 下载 [Alpaca 数据集](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)
2. 对 Alpaca 数据进行 tokenize使用以下命令
```shell
python tools/alpaca_tokenizer.py /path/to/alpaca_dataset /path/to/output_dataset /path/to/tokenizer --split_ratio 0.1
```
建议用户参考 alpaca_tokenizer.py 编写新的脚本对自己的数据集进行 tokenize
### 训练配置
以 7B Demo 的配置文件`configs/7B_sft.py`为例,介绍启动一个模型训练所需要进行的数据、模型和并行等相关的配置。
#### 数据配置
数据相关的关键参数配置及释义如下所示:
```python
TRAIN_FOLDER = "/path/to/dataset"
SEQ_LEN = 2048
data = dict(
seq_len=SEQ_LEN, # 数据样本长度,默认值为 2048
micro_num=1, # micro_num 是指在一次模型参数更新中会处理的 micro_batch 的数目,默认值为 1
micro_bsz=1, # packed_length = micro_bsz * SEQ_LEN为一次处理的 micro_batch 的数据大小,默认值为 1
total_steps=50000, # 总的所需执行的 step 的数目,默认值为 50000
min_length=50, # 若数据集文件中数据行数少于50将会被废弃
train_folder=TRAIN_FOLDER, # 数据集文件路径,默认值为 None若 train_folder 为空,则以自动生成的随机数据集进行训练测试
pack_sample_into_one=False, # 数据整理的逻辑,决定是按照 seq_len 维度或者是 sequence 的真实长度来进行attention计算
)
```
<div align="left">
<img src="./imgs/pack_into_one.png" width="550"/>
</div>
目前支持传入数据集文件路径`train_folder`,且要求文件格式如下:
```bash
- folder
- code
train_000.bin
train_000.bin.meta
```
数据集的详细内容可参考``数据准备``模块相关的介绍。
#### 模型配置
如果在启动训练时要加载模型 `checkpoint`,可进行如下相关配置:
```python
SAVE_CKPT_FOLDER = "local:/path/to/save/ckpt"
MODEL_ONLY_FOLDER = "local:/path/to/load/init/model/ckpt"
LOAD_CKPT_FOLDER = "local:/path/to/load/resume/ckpt"
ckpt = dict(
save_ckpt_folder=SAVE_CKPT_FOLDER, # 存储模型和优化器 checkpoint 的路径
checkpoint_every=float("inf"), # 每多少个 step 存储一次 checkpoint默认值为 inf
load_model_only_folder=MODEL_ONLY_FOLDER, # 加载模型初始权重的路径,只加载模型权重,不加载优化器权重,训练将从第一个 step 开始
load_ckpt_folder=LOAD_CKPT_FOLDER, # 断点续训时,加载模型和优化器等权重的路径,将从指定的 step 恢复训练
load_optimizer=True, # 断点续训时,是否需要加载优化器权重,默认值为 True
)
```
注意:
- `load_model_only_folder`与`load_ckpt_folder`不能同时设置
- 路径若以 `local:` 为前缀,则存储在本地文件系统;若以 `boto3:` 为前缀,则存储在远程 oss 上
模型相关关键参数配置如下所示:
```python
model_type = "INTERNLM" # 模型类型,默认值为 "INTERNLM",对应模型结构初始化接口函数
NUM_ATTENTION_HEAD = 32
VOCAB_SIZE = 103168
HIDDEN_SIZE = 4096
NUM_LAYER = 32
MLP_RATIO = 8 / 3
model = dict(
checkpoint=False, # 进行重计算的模型层数比例,可选值为 True/False/[0-1]
2023-07-06 04:55:23 +00:00
num_attention_heads=NUM_ATTENTION_HEAD,
embed_split_hidden=True,
vocab_size=VOCAB_SIZE,
embed_grad_scale=1,
parallel_output=True,
hidden_size=HIDDEN_SIZE,
num_layers=NUM_LAYER,
mlp_ratio=MLP_RATIO,
apply_post_layer_norm=False,
dtype="torch.bfloat16",
norm_type="rmsnorm",
layer_norm_epsilon=1e-5,
)
```
注意:用户可自定义模型类型名和模型结构,并配置相对应的模型参数。通过`utils/registry.py`下的`MODEL_INITIALIZER`对象进行模型初始化函数接口注册,在训练主函数`train.py`中初始化模型时,可通过`model_type`配置获取指定的模型初始化接口函数。
*如果基于 InternLM 7B继续训练可以参考 [ModelZoo](https://github.com/InternLM/InternLM/tree/main#model-zoo) 中 OpenXLab 链接下载权重*
#### 并行配置
训练并行配置样例如下:
```python
parallel = dict(
zero1=8,
pipeline=1,
tensor=1,
)
```
- zero1zero 并行策略,分如下三种情况,默认值为 -1
- 当`size <= 0`,则 zero1 进程组的大小等于数据并行进程组的大小,因此优化器状态参数将在数据并行范围内分配
- 当`size == 1`,则不使用 zero1 ,所有数据并行组保留完整的优化器状态参数
- 当`size > 1`且`size <= data_parallel_world_size`,则 zero1 进程组是数据并行进程组的子集
Merge develop to main (#233) * feat(utils/writer.py): support tensorboard writer (#63) * feat(utils/writer.py): support tensorboard writer * feat(utils/writer.py): add class comment --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * [Develop] Pull Main Branch (#121) * fix/fix_submodule_err (#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (#65) * fix(tokenizer): refactor tokenizer and update usage in readme (#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (#78) * Doc:add wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * [Docs]: add Japanese README (#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (#80) * [Enchancement] add more options for issue template (#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (#23) Fix https://github.com/InternLM/InternLM/issues/50 * delete torch_dtype of README's example code (#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (#99) * Update web_demo.py (#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (#106) --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * feat(core/scheduler): support pipeline parallel (#98) * feat(utils/writer.py): support tensorboard writer * feat(utils/writer.py): add class comment * feat(core): support pipeline parallel * fix(core): fix demo running error * feat(solver/optimizer): add pp zero optimizer * fix(solver/optimizer): fix word spelling error * feat(core/scheduler): add new dir scheduler in core/ * fix(core): fix ci lint error * feat(solver/optimizer): merge pp and nopp optimizer * doc(usage.md): update usage doc * feat(core/scheduler): support post func * feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape * feat(core/scheduler): add _load_micro_batch in base scheduler * feat(core/scheduler): support optimizer overlap communication in pp scheduler * feat(core/scheduler): delete data process func code * feat(core/trainer): schedule pre processing for all schedule --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> * refactor(rotaryEmbedding): refactor forward (#120) * use fp16 in instruction (#80) * delete torch_dtype of README's example code (#100) * refactor the forward for rotary embedding --------- Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> * feat(model/metrics.py): support calculating accuracy and perplexity m… (#91) * feat(model/metrics.py): support calculating accuracy and perplexity metrics * fix(model/metrics.py): fix import error * feat(train.py): minor update --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> * fix(optimizer/util.py) change inf defination * [Dev] Pull Main (#139) * fix/fix_submodule_err (#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (#65) * fix(tokenizer): refactor tokenizer and update usage in readme (#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (#78) * Doc:add wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * [Docs]: add Japanese README (#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (#80) * [Enchancement] add more options for issue template (#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (#23) Fix https://github.com/InternLM/InternLM/issues/50 * delete torch_dtype of README's example code (#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (#99) * Update web_demo.py (#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (#106) * docs(install.md): update dependency package transformers version to >= 4.28.0 (#124) Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * docs(LICENSE): add license (#125) * add license of colossalai and flash-attn * fix lint * modify the name * fix AutoModel map in convert2hf.py (#116) * variables are not printly as expect (#114) * feat(solver): fix code to adapt to torch2.0 and provide docker images (#128) * feat(solver): fix code to adapt to torch2.0 * docs(install.md): publish internlm environment image * docs(install.md): update dependency packages version * docs(install.md): update default image --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * add demo test (#132) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * fix web_demo cache accelerate (#133) * fix(hybrid_zero_optim.py): delete math import * Update embedding.py --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> Co-authored-by: huangting4201 <1538303371@qq.com> Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com> Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com> * style(solver/optimizer/utils.py): fix lint error (#147) Co-authored-by: huangting.p <huangting@sensetime.com> * feat(*): support not-flash-attn for pp and no-pp (#145) * support not flash attention for no-pp * support pipeline * modify the config * refactor the code * refactor the code * remove some unnecessary code * fix(initialize/launch.py): set default value for use_flash_attn (#158) * add default for use_flash_attn * fix lint * feat(utils/logger.py): support uniscale logger (#152) * style(internlm): fix lint error * feat(utils/logger.py): support uniscale logger * fix(utils/logger.py): fix import circular error * feat(train.py): support dashboard metric panel and fix ci train config * fix(ci_scripts/train/slurm_train.sh): fix ci train error * fix(ci_scripts/train/torchrun.sh): fix ci train error * fix(ci_scripts/train): restore ci update * fix(config.json): delete alert webhook * feat(train.py): optimize func init logger * feat(config.json): delete config.json --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> * feat(utils/evaluation.py): support evaluate (#154) * style(internlm): fix lint error * feat(utils/logger.py): support uniscale logger * fix(utils/logger.py): fix import circular error * feat(train.py): support dashboard metric panel and fix ci train config * fix(ci_scripts/train/slurm_train.sh): fix ci train error * fix(ci_scripts/train/torchrun.sh): fix ci train error * feat(utils/evaluation.py): support evaluate on validation dataset * fix(utils/evaluation.py): fix demo error * fix(ci_scripts/train/ci_7B_sft.py): fix ci train error * feat(initialize/launch.py): set default value for valid_bsz and valid_every * fix(ci_scripts/train): restore ci update * docs(configs/7B_sft.py): update comment for config * fix(config.json): delete config.json * fix evaluation bug in scheduler when use_flash_attn=False * feat(scheduler/no_pipeline_scheduler.py): support micro_bsz>1 in no pp * modify the jugement in pp and no-pp scheduler * modify the data_process_func in evaluation * fix bugs when use_flash_attn=False * rename symbol * feat(configs/7B_sft.py): change para valid_bsz to valid_micro_num * feat(scheduler/no_pipeline_scheduler.py): update para set _grad_accum_batch_size --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com> Co-authored-by: yingtongxiong <974106207@qq.com> * feat(*): support no apex (#166) * support no-apex * add default for use_apex * fix lint * modify the RMSNormTorch * remove some comments * remove use_apex parameter * remove some unnecessary code * refactor(*): refactor the code with no-apex (#170) * support no-apex * add default for use_apex * fix lint * modify the RMSNormTorch * remove some comments * remove use_apex parameter * remove some unnecessary code * optimize the code including import * remove the import RMSNorm * remove warnings * refactor(scheduler): rewrite pipeline scheduler (#138) * refactor(scheduler): rewrite pipeline scheduler * fix(*): fix pipeline scheduler bugs * fix(*): fix merge bug * feat(*): update codes with todo tag * feat(*): add comments * feat(internlm/core/scheduler): update recv_prev/next logic * feat(utils/evaluation.py): update sche metric hook for valid --------- Co-authored-by: huangting.p <huangting@sensetime.com> * feat(*): support fp32 training (#155) * support float32 training * fix lint * add adaptation in model/utils.py * remove some unnecessary code * fix lint * feat(optim): add support for fp32 zero * Revert "Merge pull request #2 from SolenoidWGT/fp32_zero" This reverts commit 53fc50b0e52f12466e8dc8ec14c5e22b217537c8, reversing changes made to 40f24d0a73fff5c083e11c18d4a07ad16aaabab3. revert commit * merge develop * Update utils.py * support fp32 in zero optimizer * modify the dtype --------- Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * feat(*): support sequence_parallel (#180) * support sequence_parallel for no pipeline * sequence_parallel does not support no-flash-attn * support sequence parallel for pipeline * add memory profiler * Update 13B.py * add memory profiler * fix evaluation bug * remove some unnecessary code * remove some unnecessary code * Update parallel_context.py * modify the config * remove memory profiler * modify the config * support selective dropout * feat(monitor): support monitor and alert (#175) * feat(monitor): support monitor and alert * feat(monitor.py): fix demo error * feat(monitor.py): move cmd monitor args to config file * feat(hybrid_zero_optim.py): if overflow occurs send alert msg * feat(monitor.py): remove alert msg filter * feat(monitor.py): optimize class MonitorTracker * feat(monitor.py): optimize code * feat(monitor.py): optimize code * feat(monitor.py): optimize code * feat(monitor.py): optimize code * feat(train.py): update print to log * style(ci): fix lint error * fix(utils/evaluation.py): remove useless code * fix(model/modeling_internlm.py): fix lint error --------- Co-authored-by: huangting4201 <huangting3@sensetime.com> * feat(ckpt): add async upload and ckpt snapshot (#161) * use fp16 in instruction (#80) * delete torch_dtype of README's example code (#100) * feat(ckpt): support async ckpt upload and ckpt snapshot --------- Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * feat(ckpt): add auto ckpt load and singal quit (#189) Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * Revert "feat(ckpt): add auto ckpt load and singal quit (#189)" (#192) This reverts commit a45a91bb843cf0b10b8b014a6ef35e695871f91b. * refactor(solver/optimizer): improve optimizer memory (#193) * refactor(solver/optimizer): improve optimizer memory * feat(data): remove useless dataset type ids map * Feat/optimizer (#194) * feat(optimier.py): reduce memory footprint and avoid _check_overflow call * feat(optimier.py): reduce memory footprint and avoid _check_overflow call * feat(optimizer.py): overlap compute norm with allreduce * update var and function name * update function compute norm (#197) Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * feat(optimizer/hybrid_zero_optim.py): overlap gradients last bucket allreduce and compute norm (#196) * support gradients allreduce and compute norm overlap * fix para set error * remove timer cal_norm for testing * feat(optimizer/hybrid_zero_optim.py): support group global norm * format(lint): fix lint error * feat(optimizer/store.py): update code based on comment --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: huangting4201 <1538303371@qq.com> * fix(ci): fix ci train error (#199) * fix/ci train error (#200) * fix(ci): fix ci train error * fix(ci): fix ci train error * fix(ci): fix ci train error * fix(train.py): fix scheduler metric hook skip error (#204) * Merge main to develop (#203) * fix/fix_submodule_err (#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (#65) * fix(tokenizer): refactor tokenizer and update usage in readme (#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (#78) * Doc:add wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * Doc:update wechat and discord link * [Docs]: add Japanese README (#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (#80) * [Enchancement] add more options for issue template (#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (#23) Fix https://github.com/InternLM/InternLM/issues/50 * delete torch_dtype of README's example code (#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (#99) * Update web_demo.py (#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (#106) * docs(install.md): update dependency package transformers version to >= 4.28.0 (#124) Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * docs(LICENSE): add license (#125) * add license of colossalai and flash-attn * fix lint * modify the name * fix AutoModel map in convert2hf.py (#116) * variables are not printly as expect (#114) * feat(solver): fix code to adapt to torch2.0 and provide docker images (#128) * feat(solver): fix code to adapt to torch2.0 * docs(install.md): publish internlm environment image * docs(install.md): update dependency packages version * docs(install.md): update default image --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * add demo test (#132) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * fix web_demo cache accelerate (#133) * Doc: add twitter link (#141) * Feat add checkpoint fraction (#151) * feat(config): add checkpoint_fraction into config * feat: remove checkpoint_fraction from configs/7B_sft.py --------- Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * [Doc] update deployment guide to keep consistency with lmdeploy (#136) * update deployment guide * fix error * use llm partition (#159) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * test(ci_scripts): clean test data after test, remove unnecessary global variables, and other optimizations (#165) * test: optimization of ci scripts(variables, test data cleaning, etc). * chore(workflows): disable ci job on push. * fix: update partition * test(ci_scripts): add install requirements automaticlly,trigger event about lint check and other optimizations (#174) * add pull_request in lint check * use default variables in ci_scripts * fix format * check and install requirements automaticlly * fix format --------- Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * feat(profiling): add a simple memory profiler (#89) * feat(profiling): add simple memory profiler * feat(profiling): add profiling argument * feat(CI_workflow): Add PR & Issue auto remove workflow (#184) * feat(ci_workflow): Add PR & Issue auto remove workflow Add a workflow for stale PR & Issue auto remove - pr & issue well be labeled as stale for inactive in 7 days - staled PR & Issue well be remove in 7 days - run this workflow every day on 1:30 a.m. * Update stale.yml * feat(bot): Create .owners.yml for Auto Assign (#176) * Create .owners.yml: for issue/pr assign automatically * Update .owners.yml * Update .owners.yml fix typo * [feat]: add pal reasoning script (#163) * [Feat] Add PAL inference script * Update README.md * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update tools/pal_inference.py Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update pal script * Update README.md * restore .ore-commit-config.yaml * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update pal inference script * Update READMD.md * Update internlm/utils/interface.py Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> * Update pal script * Update pal script * Update script * Add docstring * Update format * Update script * Update script * Update script --------- Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> * test(ci_scripts): add timeout settings and clean work after the slurm job (#185) * restore pr test on develop branch * add mask * add post action to cancel slurm job * remove readonly attribute on job log * add debug info * debug job log * try stdin * use stdin * set default value avoid error * try setting readonly on job log * performance echo * remove debug info * use squeue to check slurm job status * restore the lossed parm * litmit retry times * use exclusive to avoid port already in use * optimize loop body * remove partition * add {} for variables * set env variable for slurm partition --------- Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * refactor(tools): move interface.py and import it to web_demo (#195) * move interface.py and import it to web_demo * typo * fix(ci): fix lint error * fix(ci): fix lint error --------- Co-authored-by: Sun Peng <sunpengsdu@gmail.com> Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com> Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com> Co-authored-by: Guoteng <32697156+SolenoidWGT@users.noreply.github.com> Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> Co-authored-by: lvhan028 <lvhan_028@163.com> Co-authored-by: zachtzy <141206206+zachtzy@users.noreply.github.com> Co-authored-by: cx <759046501@qq.com> Co-authored-by: Jaylin Lee <61487970+APX103@users.noreply.github.com> Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com> Co-authored-by: Shaoyuan Xie <66255889+Daniel-xsy@users.noreply.github.com> Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: huangting4201 <huangting3@sensetime.com> * fix(pipeline_scheduler.py): fix tensor shape err and comm block (#210) * feat(train.py): support torch profiler (#201) * feat(train.py): support torch profiling * feat(train.py): optimize initialize_llm_profile * feat(train.py): profiling with tp0 and dp0 * move sequence parallel context manager to evalation func * fix lint * move the process for type_ids to load_new_batch * fix lint --------- Co-authored-by: yingtongxiong <974106207@qq.com> * feat(ckpt): add auto ckpt load and singal quit (#216) Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * feat(memory_profiler): improve memory profiler (#217) * Feat/overlap_bcast_forward (#218) * feat/support bcast forward overlao * feat/optimize the bcast call * feat/optimize the bcast call * feat/optimize the bcast call * fix lint * fix lint * fix lint * fix lint * add torch.cuda.synchronize in save_checkpoint --------- Co-authored-by: sunpeng <sunpengsdu@gmail.com> * fix(*): move sequence_parallel to parallel config (#224) * move sequence_parallel to parallel config * set the sequece_parallel default value is False * fix lint * fix lint * fix lint * Feat/example training internlm (#212) * feat(train/training_internlm.py): move common init funcs to internlm/train * feat(train/training_internlm.py): update some public funcs * feat(train/training_internlm.py): update some public funcs * feat(evaluation.py): adapt evaluate to streaming dataset * feat(train/training_internlm.py): minor update based on comments * fix(training_internlm.py): set train dataloader persistent_workers true only when num_worker>0 * fix(training_internlm.py): fix demo error * feat(data/utils.py): add new dataset type code for streaming dataset (#225) * test(model): support fp32 with flash_attn (#223) * support tf32 with flash * move autocast to attention * fix lint * fix lint * fix lint * fix lint * fix some bugs in model * modify the convert dtype * fix(pipeline): modify the sequence_parallel in pipeline (#227) * move sequence_parallel to parallel config * set the sequece_parallel default value is False * fix lint * fix lint * fix lint * modify the sequence_parallel in pp * feat(init): add skip args check flag and add zero overlap flag (#222) * feat(init): add skip args check flag * fix(optim): add param overlap enable flag * fix(ci): fix train error (#228) Co-authored-by: huangting4201 <huangting3@sensetime.com> * fix(writer): fix tensorboard resume bug (#229) * fix(train.py): fix overflow grad norm error (#230) * feat(ckpt): add train config into ckpt (#231) --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: Sun Peng <sunpengsdu@gmail.com> Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> Co-authored-by: huangting.p <huangting@sensetime.com> Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com> Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com> Co-authored-by: yingtongxiong <974106207@qq.com> Co-authored-by: cx <759046501@qq.com> Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> Co-authored-by: huangting4201 <huangting3@sensetime.com> Co-authored-by: Guoteng <32697156+SolenoidWGT@users.noreply.github.com> Co-authored-by: lvhan028 <lvhan_028@163.com> Co-authored-by: zachtzy <141206206+zachtzy@users.noreply.github.com> Co-authored-by: Jaylin Lee <61487970+APX103@users.noreply.github.com> Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com> Co-authored-by: Shaoyuan Xie <66255889+Daniel-xsy@users.noreply.github.com> Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>
2023-08-24 14:03:04 +00:00
- pipeline流水线并行大小默认值为 1
2023-07-06 04:55:23 +00:00
- tensor张量并行大小通常是每个节点的 GPU 数量,默认值为 1
注意:`数据并行大小 = 总的 GPU 数目 / 流水线并行大小 / 张量并行大小`
### 启动训练
完成了以上数据集准备和相关训练配置后,可启动 Demo 训练。接下来分别以 slurm 和 torch 环境为例,介绍训练启动方式。
若在 slurm 上启动分布式运行环境,多节点 16 卡的运行命令如下所示:
```bash
$ srun -p internllm -N 2 -n 16 --ntasks-per-node=8 --gpus-per-task=1 python train.py --config ./configs/7B_sft.py
```
若在 torch 上启动分布式运行环境,单节点 8 卡的运行命令如下所示:
```bash
$ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py --launcher "torch"
2023-07-06 04:55:23 +00:00
```
### 运行结果
以 slurm 上单机 8 卡的 Demo 训练配置为例,训练结果日志展示如下:
```bash
2023-07-07 12:26:58,293 INFO launch.py:228 in launch -- Distributed environment is initialized, data parallel size: 8, pipeline parallel size: 1, tensor parallel size: 1
2023-07-07 12:26:58,293 INFO parallel_context.py:535 in set_seed -- initialized seed on rank 2, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.
2023-07-07 12:26:58,295 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=0===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=5===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=1===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=6===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=7===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=2===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=4===========
2023-07-07 12:26:58,296 INFO train.py:378 in main -- ===========New Run Jul07_12-26-58 on host:SH-IDC1-10-140-0-135,tp:0,pp=0,dp=3===========
2023-07-07 12:28:27,826 INFO hybrid_zero_optim.py:295 in _partition_param_list -- Number of elements on ranks: [907415552, 907411456, 910163968, 910163968, 921698304, 921698304, 921698304, 921698304], rank:0
2023-07-07 12:28:57,802 INFO train.py:323 in record_current_batch_training_metrics -- tflops=63.27010355651958,step=0,loss=11.634403228759766,tgs (tokens/gpu/second)=1424.64,lr=4.0000000000000003e-07,loss_scale=65536.0,grad_norm=63.672620777841004,micro_num=4,num_consumed_tokens=131072,inf_nan_skip_batches=0,num_samples_in_batch=19,largest_length=2048,largest_batch=5,smallest_batch=4,adam_beta2=0.95,fwd_bwd_time=6.48
2023-07-07 12:29:01,636 INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.83371103277346,step=1,loss=11.613704681396484,tgs (tokens/gpu/second)=4274.45,lr=6.000000000000001e-07,loss_scale=65536.0,grad_norm=65.150786641452,micro_num=4,num_consumed_tokens=262144,inf_nan_skip_batches=0,num_samples_in_batch=16,largest_length=2048,largest_batch=5,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.67
2023-07-07 12:29:05,451 INFO train.py:323 in record_current_batch_training_metrics -- tflops=190.99928472960033,step=2,loss=11.490386962890625,tgs (tokens/gpu/second)=4300.69,lr=8.000000000000001e-07,loss_scale=65536.0,grad_norm=61.57798028719357,micro_num=4,num_consumed_tokens=393216,inf_nan_skip_batches=0,num_samples_in_batch=14,largest_length=2048,largest_batch=4,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.66
2023-07-07 12:29:09,307 INFO train.py:323 in record_current_batch_training_metrics -- tflops=188.8613541410694,step=3,loss=11.099515914916992,tgs (tokens/gpu/second)=4252.55,lr=1.0000000000000002e-06,loss_scale=65536.0,grad_norm=63.5478796484391,micro_num=4,num_consumed_tokens=524288,inf_nan_skip_batches=0,num_samples_in_batch=16,largest_length=2048,largest_batch=5,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.7
2023-07-07 12:29:13,147 INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.65918563194305,step=4,loss=10.149517059326172,tgs (tokens/gpu/second)=4270.52,lr=1.2000000000000002e-06,loss_scale=65536.0,grad_norm=51.582841631508145,micro_num=4,num_consumed_tokens=655360,inf_nan_skip_batches=0,num_samples_in_batch=19,largest_length=2048,largest_batch=6,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.68
2023-07-07 12:29:16,994 INFO train.py:323 in record_current_batch_training_metrics -- tflops=189.3109313713174,step=5,loss=9.822169303894043,tgs (tokens/gpu/second)=4262.67,lr=1.4000000000000001e-06,loss_scale=65536.0,grad_norm=47.10386835560855,micro_num=4,num_consumed_tokens=786432,inf_nan_skip_batches=0,num_samples_in_batch=17,largest_length=2048,largest_batch=6,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.69
2023-07-06 04:55:23 +00:00
```