InternLM

History

Wenwen Qu 136d55ec30 feat(moe): add moe module (#182 ) * feat(XXX): add moe * reformat code * modified: .pre-commit-config.yaml modified: internlm/model/moe.py modified: internlm/model/modeling_internlm.py * modified: internlm/model/modeling_internlm.py * modified: internlm/core/context/process_group_initializer.py modified: internlm/core/scheduler/no_pipeline_scheduler.py modified: internlm/solver/optimizer/hybrid_zero_optim.py * modified: internlm/model/moe.py modified: internlm/moe/sharded_moe.py modified: internlm/utils/parallel.py * rollback .pre-commit-config.yaml * add residual and other moe features * modify grad clipping due to moe * add param arguments * reformat code * add expert data support and fix bugs * Update .pre-commit-config.yaml * modified: internlm/model/modeling_internlm.py * add no-interleaved & no-overlapped moe pp support * support zero_overlap_communication * avoid moe parameter partition in zero optimizer * fix the moe_loss_coeff bug * suppport interleaved pp * fix moe bugs in zero optimizer * fix more moe bugs in zero optimizer * fix moe bugs in zero optimizer * add logger for moe_loss * fix bugs with merge * fix the pp moe bugs * fix bug on logger * update moe training cfg on real-dataset * refactor code * refactor code * fix bugs with compute moe norm * optimize code with moe norm computing * fix the bug that missing scale the latent moe loss * refactor code * fix moe loss logger for the interleaved pp * change the scale position for latent moe_loss * Update 7B_sft.py * add support for moe checkpoint * add comments for moe * reformat code * fix bugs * fix bugs * Update .pre-commit-config.yaml * remove moe_loss_coeff parameter passing * fix group_norms computing in hybrid_zero_optim * use dummy mode to generate random numbers in model construction * replace flashatten experts by feedforward experts * fix bugs with _compute_norm_with_moe_group * merge upstream/develop into feature_add_moe * merge upstream/develop into feature_add_moe * change float16 to bfloat16 * fix interface for dense pipeline * refactor split_moe_group code * fix precision inconsistency * refactor code * Update 7B_sft.py * refactor code * refactor code * refactor code * refactor code * refactor code for split group * refactor code for log * fix logger for moe * refactor code for split param group * fix the moe_loss for ci and val * refactor * fix bugs with split group * fix bugs in save/load moe checkpoint * add moe module to `__init__.py` * add compatible code for old version * update moe config file * modify moe config file * fix merge bugs * update moe config file * change condition for compatibility --------- Co-authored-by: zhanglei <ryancheung98@163.com> Co-authored-by: Ryan (张磊) <leizhang.real@gmail.com>	2023-09-27 15:54:53 +08:00
..
7B_MoE4_sft.py	feat(moe): add moe module (#182 )	2023-09-27 15:54:53 +08:00
7B_sft.py	test(tests/test_training): add training e2e tests for loss spike and loss accuracy (#304 )	2023-09-19 14:55:40 +08:00

* feat(XXX): add moe

* reformat code

* modified:   .pre-commit-config.yaml
	modified:   internlm/model/moe.py
	modified:   internlm/model/modeling_internlm.py

* modified:   internlm/model/modeling_internlm.py

* modified:   internlm/core/context/process_group_initializer.py
	modified:   internlm/core/scheduler/no_pipeline_scheduler.py
	modified:   internlm/solver/optimizer/hybrid_zero_optim.py

* modified:   internlm/model/moe.py
	modified:   internlm/moe/sharded_moe.py
	modified:   internlm/utils/parallel.py

* rollback .pre-commit-config.yaml

* add residual and other moe features

* modify grad clipping due to moe

* add param arguments

* reformat code

* add expert data support and fix bugs

* Update .pre-commit-config.yaml

* modified:   internlm/model/modeling_internlm.py

* add no-interleaved & no-overlapped moe pp support

* support zero_overlap_communication

* avoid moe parameter partition in zero optimizer

* fix the moe_loss_coeff bug

* suppport interleaved pp

* fix moe bugs in zero optimizer

* fix more moe bugs in zero optimizer

* fix moe bugs in zero optimizer

* add logger for moe_loss

* fix bugs with merge

* fix the pp moe bugs

* fix bug on logger

* update moe training cfg on real-dataset

* refactor code

* refactor code

* fix bugs with compute moe norm

* optimize code with moe norm computing

* fix the bug that missing scale the latent moe loss

* refactor code

* fix moe loss logger for the interleaved pp

* change the scale position for latent moe_loss

* Update 7B_sft.py

* add support for moe checkpoint

* add comments for moe

* reformat code

* fix bugs

* fix bugs

* Update .pre-commit-config.yaml

* remove moe_loss_coeff parameter passing

* fix group_norms computing in hybrid_zero_optim

* use dummy mode to generate random numbers in model construction

* replace flashatten experts by feedforward experts

* fix bugs with _compute_norm_with_moe_group

* merge upstream/develop into feature_add_moe

* merge upstream/develop into feature_add_moe

* change float16 to bfloat16

* fix interface for dense pipeline

* refactor split_moe_group code

* fix precision inconsistency

* refactor code

* Update 7B_sft.py

* refactor code

* refactor code

* refactor code

* refactor code

* refactor code for split group

* refactor code for log

* fix logger for moe

* refactor code for split param group

* fix the moe_loss for ci and val

* refactor

* fix bugs with split group

* fix bugs in save/load moe checkpoint

* add moe module to `__init__.py`

* add compatible code for old version

* update moe config file

* modify moe config file

* fix merge bugs

* update moe config file

* change condition for compatibility

---------

Co-authored-by: zhanglei <ryancheung98@163.com>
Co-authored-by: Ryan (张磊) <leizhang.real@gmail.com>

2023-09-27 15:54:53 +08:00

7B_MoE4_sft.py

feat(moe): add moe module (#182 )

2023-09-27 15:54:53 +08:00

7B_sft.py

test(tests/test_training): add training e2e tests for loss spike and loss accuracy (#304 )

2023-09-19 14:55:40 +08:00