huangting4201
|
a1fd877828
|
fix(train.py): clear memory pool before optim step
|
2023-11-15 14:40:06 +08:00 |
huangting4201
|
3c07423151
|
feat(model/overlap_handler.py): release weight
|
2023-11-14 11:30:26 +08:00 |
huangting4201
|
74754397df
|
feat(model/overlap_handler.py): add memory_pool switch and refactor overlap handler
|
2023-11-13 21:09:59 +08:00 |
yingtongxiong
|
b5e4d04a9a
|
fix conflicts
|
2023-11-06 12:08:31 +08:00 |
yingtongxiong
|
b80e6cdcf3
|
merge origin
|
2023-11-06 12:05:53 +08:00 |
yingtongxiong
|
7c6d2936b3
|
reset the sp allreduce in optimizer
|
2023-11-06 12:04:01 +08:00 |
huangting4201
|
c517ec5b8c
|
feat(model/overlap_handler.py): delete reduce_scatter_overlap switch
|
2023-11-06 11:57:14 +08:00 |
yingtongxiong
|
9b1265c591
|
modify the sp allreduce and support tf32 for fstp linear
|
2023-11-06 10:45:08 +08:00 |
huangting4201
|
5a18b3b651
|
fix(model/overlap_handler.py): fix last block hook when pp with activation
|
2023-11-02 16:05:07 +08:00 |
huangting4201
|
4851291356
|
fix(optimizer/hybrid_zero_optim.py): fix bucket size full judge condition when reduce scatter overlap
|
2023-11-02 10:30:16 +08:00 |
yingtongxiong
|
10b5056e1e
|
fix all-gather overlap the model_checkpoint is 0
|
2023-11-01 12:31:52 +08:00 |
huangting4201
|
b3def4c162
|
fix(optimizer/hybrid_zero_optim.py): add reduce_scatter_overlap switch
|
2023-10-31 20:40:58 +08:00 |
huangting4201
|
6b843253eb
|
fix(optimizer/hybrid_zero_optim.py): remove redundant _accum_grad_buckets
|
2023-10-31 20:26:36 +08:00 |
mwiacx
|
4c1cd5d49b
|
fix async reduce scatter
|
2023-10-31 19:39:24 +08:00 |
ytxiong
|
bc5a85c624
|
Merge pull request #6 from yingtongxiong/fstp/overlap-support-pp
feat(model/overlap_handler.py): fix overlap hander to support pp(non-…
|
2023-10-27 20:32:44 +08:00 |
huangting4201
|
3778c66660
|
feat(model/overlap_handler.py): fix overlap hander to support pp(non-interleaved)
|
2023-10-27 20:04:23 +08:00 |
yingtongxiong
|
aa3840fc38
|
fix some bugs
|
2023-10-26 20:42:24 +08:00 |
yingtongxiong
|
8aefb74e02
|
add flash tflops
|
2023-10-26 20:33:12 +08:00 |
yingtongxiong
|
4d83e1021b
|
Merge branch 'feat/fstp_refactor' of https://github.com/yingtongxiong/InternLM into feat/fstp_refactor
merge origin
|
2023-10-26 20:25:02 +08:00 |
mwiacx
|
3253cbf48e
|
add a new get_tflops_func
|
2023-10-26 20:21:46 +08:00 |
yingtongxiong
|
cbd4f04244
|
add synchronize
|
2023-10-26 20:04:01 +08:00 |
yingtongxiong
|
1aae39b667
|
Merge remote-tracking branch 'upstream/develop' into feat/fstp_refactor
merge develop
|
2023-10-26 17:41:42 +08:00 |
yingtongxiong
|
d831ddcc1d
|
modify the config
|
2023-10-26 17:41:17 +08:00 |
ytxiong
|
aeee9fd2a9
|
fix broadcast synchronize() (#450)
|
2023-10-26 17:33:00 +08:00 |
yingtongxiong
|
cc20fa271a
|
reset print memory
|
2023-10-25 16:48:02 +08:00 |
yingtongxiong
|
985465c96a
|
merge upstream
|
2023-10-25 14:46:45 +08:00 |
yingtongxiong
|
363275b500
|
add memory print
|
2023-10-25 14:31:00 +08:00 |
ytxiong
|
1d7e2d04ec
|
fix(*)/all-reduce for norm in sequence parallel (#443)
* fix all-reduce norm grad
* change the order of dp and sp all-reduce
* fix lint
|
2023-10-25 14:16:32 +08:00 |
yingtongxiong
|
918dff7257
|
reset moe
|
2023-10-25 13:47:19 +08:00 |
yingtongxiong
|
0bac166b7a
|
add test
|
2023-10-25 13:44:15 +08:00 |
huangting4201
|
41cfa1a10a
|
feat(model/overlap_handler.py): fix overlap handler None bug
|
2023-10-24 18:47:27 +08:00 |
yingtongxiong
|
0d3592a53f
|
Merge branch 'feat/fstp_refactor' of https://github.com/yingtongxiong/InternLM into feat/fstp_refactor
merge origin
|
2023-10-24 17:54:50 +08:00 |
yingtongxiong
|
262de4b796
|
support tflops computation and generate test py files
|
2023-10-24 17:54:26 +08:00 |
huangting4201
|
5d8313693b
|
feat(model/overlap_handler.py): fix head post backward hook when activation
|
2023-10-24 17:29:09 +08:00 |
yingtongxiong
|
97dcefc389
|
support model activation checkpoint
|
2023-10-24 16:13:52 +08:00 |
jiaopenglong
|
949a0a1d55
|
feat(optimizer): add layer norm to tensorboard (#429)
* add layer norm to tensorboard
* test moe layer norm
* add function: reduce grads
|
2023-10-23 17:07:04 +08:00 |
chenxun.p
|
0996c47e49
|
fix accumulate grads bug
|
2023-10-23 16:17:57 +08:00 |
huangting4201
|
b48687a7ff
|
Merge pull request #5 from yingtongxiong/fstp/refactor-hook-handle
feat(model/overlap_handler.py): refactor overlap hook handle
|
2023-10-23 15:35:34 +08:00 |
huangting4201
|
b2c1a70477
|
feat(train/training_internlm.py): fix lint error
|
2023-10-23 15:34:24 +08:00 |
huangting4201
|
9cf1ff0f6e
|
feat(solver/optimizer/hybrid_zero_optim.py): minor update
|
2023-10-23 15:31:41 +08:00 |
huangting4201
|
03cc7f9b80
|
feat(model/overlap_handler.py): fix lint error
|
2023-10-23 15:28:34 +08:00 |
huangting4201
|
0d693cf3a1
|
feat(model/overlap_handler.py): fix lint error
|
2023-10-23 15:22:03 +08:00 |
yingtongxiong
|
f6a5086fe4
|
support bias
|
2023-10-23 14:51:27 +08:00 |
huangting4201
|
e7f9f1d208
|
feat(model/overlap_handler.py): optimize reduce scatter mem pool
|
2023-10-23 13:31:23 +08:00 |
huangting4201
|
b20f47a1fe
|
feat(model/overlap_handler.py): move handler to gpc
|
2023-10-23 12:02:32 +08:00 |
huangting4201
|
85ad917ae4
|
feat(model/overlap_handler.py): refactor overlap hook handle
|
2023-10-20 21:50:32 +08:00 |
yingtongxiong
|
1804d01bb3
|
merge reduce-scatter
|
2023-10-20 18:11:00 +08:00 |
yingtongxiong
|
dcd89ed304
|
refactor linear
|
2023-10-20 17:50:56 +08:00 |
ytxiong
|
f22e5b3b28
|
Merge pull request #4 from yingtongxiong/fstp/refactor-config
feat(initialize/launch.py): refactor config for fstp
|
2023-10-20 17:48:20 +08:00 |
huangting4201
|
2acf9b817f
|
feat(utils/gputest.py): fix lint error
|
2023-10-20 16:25:08 +08:00 |