mirror of https://github.com/InternLM/InternLM
				
				
				
			
				
					
						
							* feat(fsdp): add training option for fsdp * fix(fsdp): add mix-precision training * fix failure in lint-check * fix format problem * restore 7B_sft * fix load ckpt bug * fix load ckpt bug2 * feat(solver/optimizer): add new file fsdp_optimizer.py * fix(train.py): fix ci lint error * fix(fsdp_optimizer.py): wait grad async * fix bug for loading ckpts when zero1 < dp_size * fix(context/parallel_context.py): only log warning for fsdp * change ckpt name * fix(model/modeling_internlm.py): fix checkpoint=False runtime error * more wrap * add support for FSDP with tp * modify args_sanity_check for fsdp with pipeline and fsdp with moe * fix(internlm/utils/parallel.py): fix circular import * fix(internlm/train/training_internlm.py): remove set IS_TENSOR_PARALLEL attr * fix(internlm/train/training_internlm.py): update wrap class and fix lint error * fix(internlm/model): reset dropout_selective_checkpoint=True * feat(configs/7B_sft.py): move fsdp config to parallel zero1 * feat(configs/7B_sft.py): adapt to old version config --------- Co-authored-by: huangting4201 <1538303371@qq.com>  | 
			||
|---|---|---|
| .. | ||
| __init__.py | ||
| checkpoint.py | ||
| common.py | ||
| evaluation.py | ||
| gputest.py | ||
| logger.py | ||
| megatron_timers.py | ||
| model_checkpoint.py | ||
| parallel.py | ||
| registry.py | ||
| simple_memory_profiler.py | ||
| storage_manager.py | ||
| timeout.py | ||
| writer.py | ||