yingtongxiong
363275b500
add memory print
2023-10-25 14:31:00 +08:00
yingtongxiong
918dff7257
reset moe
2023-10-25 13:47:19 +08:00
huangting4201
41cfa1a10a
feat(model/overlap_handler.py): fix overlap handler None bug
2023-10-24 18:47:27 +08:00
huangting4201
5d8313693b
feat(model/overlap_handler.py): fix head post backward hook when activation
2023-10-24 17:29:09 +08:00
yingtongxiong
97dcefc389
support model activation checkpoint
2023-10-24 16:13:52 +08:00
huangting4201
03cc7f9b80
feat(model/overlap_handler.py): fix lint error
2023-10-23 15:28:34 +08:00
huangting4201
0d693cf3a1
feat(model/overlap_handler.py): fix lint error
2023-10-23 15:22:03 +08:00
yingtongxiong
f6a5086fe4
support bias
2023-10-23 14:51:27 +08:00
huangting4201
e7f9f1d208
feat(model/overlap_handler.py): optimize reduce scatter mem pool
2023-10-23 13:31:23 +08:00
huangting4201
b20f47a1fe
feat(model/overlap_handler.py): move handler to gpc
2023-10-23 12:02:32 +08:00
huangting4201
85ad917ae4
feat(model/overlap_handler.py): refactor overlap hook handle
2023-10-20 21:50:32 +08:00
yingtongxiong
1804d01bb3
merge reduce-scatter
2023-10-20 18:11:00 +08:00
yingtongxiong
dcd89ed304
refactor linear
2023-10-20 17:50:56 +08:00
huangting4201
eac382ad0a
feat(optimizer/hybrid_zero_optim.py): fix lint error
2023-10-20 16:22:29 +08:00
huangting4201
d91a5d9d9e
feat(initialize/launch.py): refactor config for fstp
2023-10-20 15:59:40 +08:00
huangting4201
815a584930
feat(model/linear.py): remove useless code
2023-10-20 11:27:59 +08:00
yingtongxiong
ed7232777a
support reduce scatter memory pool
2023-10-20 10:35:45 +08:00
yingtongxiong
4742271154
add memory pool
2023-10-19 13:21:33 +08:00
yingtongxiong
a5aeab2a3f
memory profiling test
2023-10-17 19:54:21 +08:00
yingtongxiong
5abe519c4c
remove full weight for block 0
2023-10-17 16:37:06 +08:00
yingtongxiong
5c38cb6409
add head overlap
2023-10-17 15:38:24 +08:00
yingtongxiong
a5c6e457b9
Merge branch 'feat/fstp' of https://github.com/yingtongxiong/InternLM into feat/fstp
2023-10-17 15:17:03 +08:00
yingtongxiong
6408b944c2
support fine grained
2023-10-17 15:14:39 +08:00
chenxun.p
6682f5d92a
fix reduce scatter async bug
2023-10-17 15:10:07 +08:00
chenxun.p
229cc5c68c
impl reduce scatter async
2023-10-17 11:15:54 +08:00
huangting4201
d1af0d6aee
feat(model/linear.py): block-grained backward
2023-10-17 10:13:56 +08:00
huangting4201
0d1fa037dd
feat(model/linear.py): set block 0 full weight
2023-10-16 20:13:59 +08:00
yingtongxiong
82204eea59
support hybrid overlap
2023-10-16 16:35:14 +08:00
huangting4201
d0f0c22cac
feat(model/linear.py): change pre backward from wqkv to block
2023-10-13 11:10:23 +08:00
huangting4201
d0b1346993
feat(model/linear.py): support block allgather overlap
2023-10-12 19:42:08 +08:00
yingtongxiong
5fd5a8a32b
support fine-grained overlap
2023-10-11 17:36:41 +08:00
yingtongxiong
792b066f15
communication overlap
2023-10-11 10:57:12 +08:00
yingtongxiong
0fac845c36
overlap grad_input computation and grad_weight reduce_scatter
2023-10-10 17:06:13 +08:00
yingtongxiong
dd67ab948d
merge develop
2023-10-09 21:40:02 +08:00
yingtongxiong
1b7935dd98
merge upstream develop
2023-10-09 21:35:52 +08:00
Pryest
b3645b0244
fix(model): fix errant inference_forward ( #396 )
...
* Fix errant inference_forward.
* Recover use_dynamic_ntk_rope.
* Fix bugs.
* Fit to flash attention 1.0
* Fit to flash attention 1.0
* Fit to flash attention 1.0.5.
* Fit to flash attention 1.0.5.
2023-10-09 08:29:11 -05:00
yingtongxiong
007e58a4af
merge upstream develop
2023-10-09 20:54:26 +08:00
yingtongxiong
f191853bf4
fix lint
2023-10-09 20:39:57 +08:00
yingtongxiong
29df765f65
refactor code
2023-10-09 20:23:32 +08:00
zaglc
a075153adf
feat(train): add fsdp training option ( #293 )
...
* feat(fsdp): add training option for fsdp
* fix(fsdp): add mix-precision training
* fix failure in lint-check
* fix format problem
* restore 7B_sft
* fix load ckpt bug
* fix load ckpt bug2
* feat(solver/optimizer): add new file fsdp_optimizer.py
* fix(train.py): fix ci lint error
* fix(fsdp_optimizer.py): wait grad async
* fix bug for loading ckpts when zero1 < dp_size
* fix(context/parallel_context.py): only log warning for fsdp
* change ckpt name
* fix(model/modeling_internlm.py): fix checkpoint=False runtime error
* more wrap
* add support for FSDP with tp
* modify args_sanity_check for fsdp with pipeline and fsdp with moe
* fix(internlm/utils/parallel.py): fix circular import
* fix(internlm/train/training_internlm.py): remove set IS_TENSOR_PARALLEL attr
* fix(internlm/train/training_internlm.py): update wrap class and fix lint error
* fix(internlm/model): reset dropout_selective_checkpoint=True
* feat(configs/7B_sft.py): move fsdp config to parallel zero1
* feat(configs/7B_sft.py): adapt to old version config
---------
Co-authored-by: huangting4201 <1538303371@qq.com>
2023-10-09 18:59:31 +08:00
yingtongxiong
21c1a7fa47
support evaluation with fstp
2023-10-09 18:01:06 +08:00
yingtongxiong
189a313da6
support fstp and refactor code
2023-10-09 17:26:20 +08:00
yingtongxiong
bd4af3a31f
modify the all2all
2023-10-08 17:21:17 +08:00
yingtongxiong
bf475b6940
debug
2023-10-08 13:20:29 +08:00
yingtongxiong
10aa63f0e1
support optimized sp
2023-10-07 14:03:47 +08:00
Wenwen Qu
136d55ec30
feat(moe): add moe module ( #182 )
...
* feat(XXX): add moe
* reformat code
* modified: .pre-commit-config.yaml
modified: internlm/model/moe.py
modified: internlm/model/modeling_internlm.py
* modified: internlm/model/modeling_internlm.py
* modified: internlm/core/context/process_group_initializer.py
modified: internlm/core/scheduler/no_pipeline_scheduler.py
modified: internlm/solver/optimizer/hybrid_zero_optim.py
* modified: internlm/model/moe.py
modified: internlm/moe/sharded_moe.py
modified: internlm/utils/parallel.py
* rollback .pre-commit-config.yaml
* add residual and other moe features
* modify grad clipping due to moe
* add param arguments
* reformat code
* add expert data support and fix bugs
* Update .pre-commit-config.yaml
* modified: internlm/model/modeling_internlm.py
* add no-interleaved & no-overlapped moe pp support
* support zero_overlap_communication
* avoid moe parameter partition in zero optimizer
* fix the moe_loss_coeff bug
* suppport interleaved pp
* fix moe bugs in zero optimizer
* fix more moe bugs in zero optimizer
* fix moe bugs in zero optimizer
* add logger for moe_loss
* fix bugs with merge
* fix the pp moe bugs
* fix bug on logger
* update moe training cfg on real-dataset
* refactor code
* refactor code
* fix bugs with compute moe norm
* optimize code with moe norm computing
* fix the bug that missing scale the latent moe loss
* refactor code
* fix moe loss logger for the interleaved pp
* change the scale position for latent moe_loss
* Update 7B_sft.py
* add support for moe checkpoint
* add comments for moe
* reformat code
* fix bugs
* fix bugs
* Update .pre-commit-config.yaml
* remove moe_loss_coeff parameter passing
* fix group_norms computing in hybrid_zero_optim
* use dummy mode to generate random numbers in model construction
* replace flashatten experts by feedforward experts
* fix bugs with _compute_norm_with_moe_group
* merge upstream/develop into feature_add_moe
* merge upstream/develop into feature_add_moe
* change float16 to bfloat16
* fix interface for dense pipeline
* refactor split_moe_group code
* fix precision inconsistency
* refactor code
* Update 7B_sft.py
* refactor code
* refactor code
* refactor code
* refactor code
* refactor code for split group
* refactor code for log
* fix logger for moe
* refactor code for split param group
* fix the moe_loss for ci and val
* refactor
* fix bugs with split group
* fix bugs in save/load moe checkpoint
* add moe module to `__init__.py`
* add compatible code for old version
* update moe config file
* modify moe config file
* fix merge bugs
* update moe config file
* change condition for compatibility
---------
Co-authored-by: zhanglei <ryancheung98@163.com>
Co-authored-by: Ryan (张磊) <leizhang.real@gmail.com>
2023-09-27 15:54:53 +08:00
huangting4201
3b0eff0c8a
fix(model/embedding.py): ci lint check error ( #345 )
...
* fix(ci): fix ci lint error
* fix(ci): fix ci lint error
2023-09-21 14:46:22 +08:00
YWMditto
8464425a7b
feat(mdoel): add DynamicNTKScalingRotaryEmbedding ( #339 )
...
* add dynamic ntk rope
* update dynamic ntk rope
* fix lint check
* fix lint check
* add more desc
---------
Co-authored-by: YWMditto <862779238@qq.com>
2023-09-20 23:31:47 +08:00
ytxiong
6a5915bf0d
feat(linear): optimize mlp by using jit ( #321 )
...
* fuse silu op
* refactor code
* fix lint
* fix lint
2023-09-19 14:57:43 +08:00
yingtongxiong
0c276d8de2
Merge remote-tracking branch 'origin/main' into develop
2023-09-08 10:19:54 +08:00