Commit Graph

263 Commits (918dff72579baeb205ed8dc47bce9a2d7aba2c7d)

Author SHA1 Message Date
yingtongxiong 918dff7257 reset moe 2023-10-25 13:47:19 +08:00
yingtongxiong 0bac166b7a add test 2023-10-25 13:44:15 +08:00
huangting4201 41cfa1a10a feat(model/overlap_handler.py): fix overlap handler None bug 2023-10-24 18:47:27 +08:00
yingtongxiong 0d3592a53f Merge branch 'feat/fstp_refactor' of https://github.com/yingtongxiong/InternLM into feat/fstp_refactor
merge origin
2023-10-24 17:54:50 +08:00
yingtongxiong 262de4b796 support tflops computation and generate test py files 2023-10-24 17:54:26 +08:00
huangting4201 5d8313693b feat(model/overlap_handler.py): fix head post backward hook when activation 2023-10-24 17:29:09 +08:00
yingtongxiong 97dcefc389 support model activation checkpoint 2023-10-24 16:13:52 +08:00
chenxun.p 0996c47e49 fix accumulate grads bug 2023-10-23 16:17:57 +08:00
huangting4201 b48687a7ff
Merge pull request #5 from yingtongxiong/fstp/refactor-hook-handle
feat(model/overlap_handler.py): refactor overlap hook handle
2023-10-23 15:35:34 +08:00
huangting4201 b2c1a70477 feat(train/training_internlm.py): fix lint error 2023-10-23 15:34:24 +08:00
huangting4201 9cf1ff0f6e feat(solver/optimizer/hybrid_zero_optim.py): minor update 2023-10-23 15:31:41 +08:00
huangting4201 03cc7f9b80 feat(model/overlap_handler.py): fix lint error 2023-10-23 15:28:34 +08:00
huangting4201 0d693cf3a1 feat(model/overlap_handler.py): fix lint error 2023-10-23 15:22:03 +08:00
yingtongxiong f6a5086fe4 support bias 2023-10-23 14:51:27 +08:00
huangting4201 e7f9f1d208 feat(model/overlap_handler.py): optimize reduce scatter mem pool 2023-10-23 13:31:23 +08:00
huangting4201 b20f47a1fe feat(model/overlap_handler.py): move handler to gpc 2023-10-23 12:02:32 +08:00
huangting4201 85ad917ae4 feat(model/overlap_handler.py): refactor overlap hook handle 2023-10-20 21:50:32 +08:00
yingtongxiong 1804d01bb3 merge reduce-scatter 2023-10-20 18:11:00 +08:00
yingtongxiong dcd89ed304 refactor linear 2023-10-20 17:50:56 +08:00
ytxiong f22e5b3b28
Merge pull request #4 from yingtongxiong/fstp/refactor-config
feat(initialize/launch.py): refactor config for fstp
2023-10-20 17:48:20 +08:00
huangting4201 2acf9b817f feat(utils/gputest.py): fix lint error 2023-10-20 16:25:08 +08:00
huangting4201 eac382ad0a feat(optimizer/hybrid_zero_optim.py): fix lint error 2023-10-20 16:22:29 +08:00
huangting4201 3c6925499f feat(optimizer/hybrid_zero_optim.py): resolve conflicts 2023-10-20 16:18:01 +08:00
huangting4201 d91a5d9d9e feat(initialize/launch.py): refactor config for fstp 2023-10-20 15:59:40 +08:00
chenxun.p 95488d8e8f update optimizer accumulate grad impl when fstp 2023-10-20 15:58:06 +08:00
huangting4201 815a584930 feat(model/linear.py): remove useless code 2023-10-20 11:27:59 +08:00
yingtongxiong ed7232777a support reduce scatter memory pool 2023-10-20 10:35:45 +08:00
yingtongxiong 4742271154 add memory pool 2023-10-19 13:21:33 +08:00
yingtongxiong a5aeab2a3f memory profiling test 2023-10-17 19:54:21 +08:00
yingtongxiong 16ef7b7889 add test 2023-10-17 17:16:39 +08:00
yingtongxiong 5abe519c4c remove full weight for block 0 2023-10-17 16:37:06 +08:00
yingtongxiong 5c38cb6409 add head overlap 2023-10-17 15:38:24 +08:00
yingtongxiong a5c6e457b9 Merge branch 'feat/fstp' of https://github.com/yingtongxiong/InternLM into feat/fstp 2023-10-17 15:17:03 +08:00
yingtongxiong 6408b944c2 support fine grained 2023-10-17 15:14:39 +08:00
chenxun.p b51cf4ebc3 Merge branch 'feat/fstp' of github.com:yingtongxiong/InternLM into feat/fstp 2023-10-17 15:10:27 +08:00
chenxun.p 6682f5d92a fix reduce scatter async bug 2023-10-17 15:10:07 +08:00
huangting4201 4e99a7fdbc feat(train/training_internlm.py): remove abnormal tgs when calculating avg tgs 2023-10-17 11:30:44 +08:00
chenxun.p 229cc5c68c impl reduce scatter async 2023-10-17 11:15:54 +08:00
huangting4201 d1af0d6aee feat(model/linear.py): block-grained backward 2023-10-17 10:13:56 +08:00
huangting4201 0d1fa037dd feat(model/linear.py): set block 0 full weight 2023-10-16 20:13:59 +08:00
yingtongxiong 82204eea59 support hybrid overlap 2023-10-16 16:35:14 +08:00
huangting4201 d0f0c22cac feat(model/linear.py): change pre backward from wqkv to block 2023-10-13 11:10:23 +08:00
huangting4201 d0b1346993 feat(model/linear.py): support block allgather overlap 2023-10-12 19:42:08 +08:00
yingtongxiong 5fd5a8a32b support fine-grained overlap 2023-10-11 17:36:41 +08:00
yingtongxiong 792b066f15 communication overlap 2023-10-11 10:57:12 +08:00
yingtongxiong c94be64fd2 merge origin 2023-10-10 17:13:46 +08:00
yingtongxiong 0fac845c36 overlap grad_input computation and grad_weight reduce_scatter 2023-10-10 17:06:13 +08:00
huangting4201 5fb6d99c11 feat(configs/7B_sft.py): update parallel config comment 2023-10-10 11:45:11 +08:00
yingtongxiong db637542a6 fix lint 2023-10-09 22:19:21 +08:00
yingtongxiong dd67ab948d merge develop 2023-10-09 21:40:02 +08:00