InternLM

History

zaglc a075153adf feat(train): add fsdp training option (#293 ) * feat(fsdp): add training option for fsdp * fix(fsdp): add mix-precision training * fix failure in lint-check * fix format problem * restore 7B_sft * fix load ckpt bug * fix load ckpt bug2 * feat(solver/optimizer): add new file fsdp_optimizer.py * fix(train.py): fix ci lint error * fix(fsdp_optimizer.py): wait grad async * fix bug for loading ckpts when zero1 < dp_size * fix(context/parallel_context.py): only log warning for fsdp * change ckpt name * fix(model/modeling_internlm.py): fix checkpoint=False runtime error * more wrap * add support for FSDP with tp * modify args_sanity_check for fsdp with pipeline and fsdp with moe * fix(internlm/utils/parallel.py): fix circular import * fix(internlm/train/training_internlm.py): remove set IS_TENSOR_PARALLEL attr * fix(internlm/train/training_internlm.py): update wrap class and fix lint error * fix(internlm/model): reset dropout_selective_checkpoint=True * feat(configs/7B_sft.py): move fsdp config to parallel zero1 * feat(configs/7B_sft.py): adapt to old version config --------- Co-authored-by: huangting4201 <1538303371@qq.com>		2023-10-09 18:59:31 +08:00
..
__init__.py	initial commit	2023-07-06 12:55:23 +08:00
checkpoint.py	initial commit	2023-07-06 12:55:23 +08:00
common.py	Merge develop to main (#233 )	2023-08-24 22:03:04 +08:00
evaluation.py	feat(moe): add moe module (#182 )	2023-09-27 15:54:53 +08:00
gputest.py	Feat(PythonGC): Do garbage collection manually (#326 )	2023-09-22 13:52:25 +08:00
logger.py	feat(utils): add timeout warpper for key functions (#286 )	2023-09-07 17:26:17 +08:00
megatron_timers.py	feat: add runtime diag (#297 )	2023-09-08 17:56:46 +08:00
model_checkpoint.py	feat(train): add fsdp training option (#293 )	2023-10-09 18:59:31 +08:00
parallel.py	feat(train): add fsdp training option (#293 )	2023-10-09 18:59:31 +08:00
registry.py	Merge develop to main (#233 )	2023-08-24 22:03:04 +08:00
simple_memory_profiler.py	Merge develop to main (#233 )	2023-08-24 22:03:04 +08:00
storage_manager.py	fix(storage): fix try_get_storage_backend (#359 )	2023-09-25 15:16:25 +08:00
timeout.py	feat(moe): add moe module (#182 )	2023-09-27 15:54:53 +08:00
writer.py	feat(utils/writer.py): support writer add_scalars for writing dict data (#257 )	2023-09-01 13:24:46 +08:00