* support sequence_parallel for no pipeline
* sequence_parallel does not support no-flash-attn
* support sequence parallel for pipeline
* add memory profiler
* Update 13B.py
* add memory profiler
* fix evaluation bug
* remove some unnecessary code
* remove some unnecessary code
* Update parallel_context.py
* modify the config
* remove memory profiler
* modify the config
* support selective dropout
* support float32 training
* fix lint
* add adaptation in model/utils.py
* remove some unnecessary code
* fix lint
* feat(optim): add support for fp32 zero
* Revert "Merge pull request #2 from SolenoidWGT/fp32_zero"
This reverts commit 53fc50b0e5, reversing
changes made to 40f24d0a73.
revert commit
* merge develop
* Update utils.py
* support fp32 in zero optimizer
* modify the dtype
---------
Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>
* style(internlm): fix lint error
* feat(utils/logger.py): support uniscale logger
* fix(utils/logger.py): fix import circular error
* feat(train.py): support dashboard metric panel and fix ci train config
* fix(ci_scripts/train/slurm_train.sh): fix ci train error
* fix(ci_scripts/train/torchrun.sh): fix ci train error
* feat(utils/evaluation.py): support evaluate on validation dataset
* fix(utils/evaluation.py): fix demo error
* fix(ci_scripts/train/ci_7B_sft.py): fix ci train error
* feat(initialize/launch.py): set default value for valid_bsz and valid_every
* fix(ci_scripts/train): restore ci update
* docs(configs/7B_sft.py): update comment for config
* fix(config.json): delete config.json
* fix evaluation bug in scheduler when use_flash_attn=False
* feat(scheduler/no_pipeline_scheduler.py): support micro_bsz>1 in no pp
* modify the jugement in pp and no-pp scheduler
* modify the data_process_func in evaluation
* fix bugs when use_flash_attn=False
* rename symbol
* feat(configs/7B_sft.py): change para valid_bsz to valid_micro_num
* feat(scheduler/no_pipeline_scheduler.py): update para set _grad_accum_batch_size
---------
Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>
Co-authored-by: yingtongxiong <974106207@qq.com>
* feat(utils/writer.py): support tensorboard writer
* feat(utils/writer.py): add class comment
* feat(core): support pipeline parallel
* fix(core): fix demo running error
* feat(solver/optimizer): add pp zero optimizer
* fix(solver/optimizer): fix word spelling error
* feat(core/scheduler): add new dir scheduler in core/
* fix(core): fix ci lint error
* feat(solver/optimizer): merge pp and nopp optimizer
* doc(usage.md): update usage doc
* feat(core/scheduler): support post func
* feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape
* feat(core/scheduler): add _load_micro_batch in base scheduler
* feat(core/scheduler): support optimizer overlap communication in pp scheduler
* feat(core/scheduler): delete data process func code
* feat(core/trainer): schedule pre processing for all schedule
---------
Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>