jiaopenglong
9fc252f40e
add output embedding tf32 option ( #523 )
2023-12-06 13:50:59 +08:00
Wenwen Qu
136d55ec30
feat(moe): add moe module ( #182 )
...
* feat(XXX): add moe
* reformat code
* modified: .pre-commit-config.yaml
modified: internlm/model/moe.py
modified: internlm/model/modeling_internlm.py
* modified: internlm/model/modeling_internlm.py
* modified: internlm/core/context/process_group_initializer.py
modified: internlm/core/scheduler/no_pipeline_scheduler.py
modified: internlm/solver/optimizer/hybrid_zero_optim.py
* modified: internlm/model/moe.py
modified: internlm/moe/sharded_moe.py
modified: internlm/utils/parallel.py
* rollback .pre-commit-config.yaml
* add residual and other moe features
* modify grad clipping due to moe
* add param arguments
* reformat code
* add expert data support and fix bugs
* Update .pre-commit-config.yaml
* modified: internlm/model/modeling_internlm.py
* add no-interleaved & no-overlapped moe pp support
* support zero_overlap_communication
* avoid moe parameter partition in zero optimizer
* fix the moe_loss_coeff bug
* suppport interleaved pp
* fix moe bugs in zero optimizer
* fix more moe bugs in zero optimizer
* fix moe bugs in zero optimizer
* add logger for moe_loss
* fix bugs with merge
* fix the pp moe bugs
* fix bug on logger
* update moe training cfg on real-dataset
* refactor code
* refactor code
* fix bugs with compute moe norm
* optimize code with moe norm computing
* fix the bug that missing scale the latent moe loss
* refactor code
* fix moe loss logger for the interleaved pp
* change the scale position for latent moe_loss
* Update 7B_sft.py
* add support for moe checkpoint
* add comments for moe
* reformat code
* fix bugs
* fix bugs
* Update .pre-commit-config.yaml
* remove moe_loss_coeff parameter passing
* fix group_norms computing in hybrid_zero_optim
* use dummy mode to generate random numbers in model construction
* replace flashatten experts by feedforward experts
* fix bugs with _compute_norm_with_moe_group
* merge upstream/develop into feature_add_moe
* merge upstream/develop into feature_add_moe
* change float16 to bfloat16
* fix interface for dense pipeline
* refactor split_moe_group code
* fix precision inconsistency
* refactor code
* Update 7B_sft.py
* refactor code
* refactor code
* refactor code
* refactor code
* refactor code for split group
* refactor code for log
* fix logger for moe
* refactor code for split param group
* fix the moe_loss for ci and val
* refactor
* fix bugs with split group
* fix bugs in save/load moe checkpoint
* add moe module to `__init__.py`
* add compatible code for old version
* update moe config file
* modify moe config file
* fix merge bugs
* update moe config file
* change condition for compatibility
---------
Co-authored-by: zhanglei <ryancheung98@163.com>
Co-authored-by: Ryan (张磊) <leizhang.real@gmail.com>
2023-09-27 15:54:53 +08:00
Wenwen Qu
655e9dae40
Feat(norm)/support fused precision ( #319 )
...
* add fused precision support for norm
* refactor code
* refactor code
* change the granularity of hook
* fix bugs if self.model is ModuleList
* add dtype condition for post hook
* refactor code for split group
* refactor code for pre/post hook
* refactor code for split group
* remove fp32 hook for norm
* unit tests for fused precision
* add doc for fused precision
* add doc for En. version
* reformat docs
* Update mixed_precision.rst
* Update mixed_precision.po
* update mixed_precision.po
2023-09-26 20:39:55 +08:00
cx
0268d8eda1
refactor(scheduler): rewrite pipeline scheduler ( #138 )
...
* refactor(scheduler): rewrite pipeline scheduler
* fix(*): fix pipeline scheduler bugs
* fix(*): fix merge bug
* feat(*): update codes with todo tag
* feat(*): add comments
* feat(internlm/core/scheduler): update recv_prev/next logic
* feat(utils/evaluation.py): update sche metric hook for valid
---------
Co-authored-by: huangting.p <huangting@sensetime.com>
2023-08-03 11:48:12 +08:00
Sun Peng
fa7337b37b
initial commit
2023-07-06 12:55:23 +08:00