InternLM

Commit Graph

Author	SHA1	Message	Date
mwiacx	3253cbf48e	add a new get_tflops_func	2023-10-26 20:21:46 +08:00
yingtongxiong	cc20fa271a	reset print memory	2023-10-25 16:48:02 +08:00
yingtongxiong	985465c96a	merge upstream	2023-10-25 14:46:45 +08:00
yingtongxiong	363275b500	add memory print	2023-10-25 14:31:00 +08:00
ytxiong	1d7e2d04ec	fix()/all-reduce for norm in sequence parallel (#443 ) fix all-reduce norm grad * change the order of dp and sp all-reduce * fix lint	2023-10-25 14:16:32 +08:00
yingtongxiong	918dff7257	reset moe	2023-10-25 13:47:19 +08:00
yingtongxiong	0bac166b7a	add test	2023-10-25 13:44:15 +08:00
huangting4201	41cfa1a10a	feat(model/overlap_handler.py): fix overlap handler None bug	2023-10-24 18:47:27 +08:00
yingtongxiong	0d3592a53f	Merge branch 'feat/fstp_refactor' of https://github.com/yingtongxiong/InternLM into feat/fstp_refactor merge origin	2023-10-24 17:54:50 +08:00
yingtongxiong	262de4b796	support tflops computation and generate test py files	2023-10-24 17:54:26 +08:00
huangting4201	5d8313693b	feat(model/overlap_handler.py): fix head post backward hook when activation	2023-10-24 17:29:09 +08:00
yingtongxiong	97dcefc389	support model activation checkpoint	2023-10-24 16:13:52 +08:00
jiaopenglong	949a0a1d55	feat(optimizer): add layer norm to tensorboard (#429 ) * add layer norm to tensorboard * test moe layer norm * add function: reduce grads	2023-10-23 17:07:04 +08:00
chenxun.p	0996c47e49	fix accumulate grads bug	2023-10-23 16:17:57 +08:00
huangting4201	b48687a7ff	Merge pull request #5 from yingtongxiong/fstp/refactor-hook-handle feat(model/overlap_handler.py): refactor overlap hook handle	2023-10-23 15:35:34 +08:00
huangting4201	b2c1a70477	feat(train/training_internlm.py): fix lint error	2023-10-23 15:34:24 +08:00
huangting4201	9cf1ff0f6e	feat(solver/optimizer/hybrid_zero_optim.py): minor update	2023-10-23 15:31:41 +08:00
huangting4201	03cc7f9b80	feat(model/overlap_handler.py): fix lint error	2023-10-23 15:28:34 +08:00
huangting4201	0d693cf3a1	feat(model/overlap_handler.py): fix lint error	2023-10-23 15:22:03 +08:00
yingtongxiong	f6a5086fe4	support bias	2023-10-23 14:51:27 +08:00
huangting4201	e7f9f1d208	feat(model/overlap_handler.py): optimize reduce scatter mem pool	2023-10-23 13:31:23 +08:00
huangting4201	b20f47a1fe	feat(model/overlap_handler.py): move handler to gpc	2023-10-23 12:02:32 +08:00
huangting4201	85ad917ae4	feat(model/overlap_handler.py): refactor overlap hook handle	2023-10-20 21:50:32 +08:00
yingtongxiong	1804d01bb3	merge reduce-scatter	2023-10-20 18:11:00 +08:00
yingtongxiong	dcd89ed304	refactor linear	2023-10-20 17:50:56 +08:00
ytxiong	f22e5b3b28	Merge pull request #4 from yingtongxiong/fstp/refactor-config feat(initialize/launch.py): refactor config for fstp	2023-10-20 17:48:20 +08:00
huangting4201	2acf9b817f	feat(utils/gputest.py): fix lint error	2023-10-20 16:25:08 +08:00
huangting4201	eac382ad0a	feat(optimizer/hybrid_zero_optim.py): fix lint error	2023-10-20 16:22:29 +08:00
huangting4201	3c6925499f	feat(optimizer/hybrid_zero_optim.py): resolve conflicts	2023-10-20 16:18:01 +08:00
huangting4201	d91a5d9d9e	feat(initialize/launch.py): refactor config for fstp	2023-10-20 15:59:40 +08:00
chenxun.p	95488d8e8f	update optimizer accumulate grad impl when fstp	2023-10-20 15:58:06 +08:00
kkscilife	140be20511	test(workflow): add unit test yaml (#427 ) * add unit test yaml * add main branch --------- Co-authored-by: changxiaodongTHU <2437105032@qq.com>	2023-10-20 14:22:58 +08:00
huangting4201	815a584930	feat(model/linear.py): remove useless code	2023-10-20 11:27:59 +08:00
yingtongxiong	ed7232777a	support reduce scatter memory pool	2023-10-20 10:35:45 +08:00
Wenwen Qu	3c992a2101	fix(pipeline): fix interleave type assert and metrics error (#423 ) * fix interleave type assert bug * refactor code for assert * fix is_no_pp_or_last_stage logic	2023-10-19 17:29:30 +08:00
jiaxingli	3ea46324dd	fix: unitest (#424 )	2023-10-19 15:19:40 +08:00
yingtongxiong	4742271154	add memory pool	2023-10-19 13:21:33 +08:00
Wenwen Qu	2c5395fdfd	Doc(moe): add documentation for moe training (#411 ) * add doc for moe * fix moe and zero1 check in args_sanity_check * restore moe config file	2023-10-19 10:01:12 +08:00
Guoteng	3ea94f2e2a	fix(utils): disable bench_net in gputest.py (#421 )	2023-10-19 10:00:57 +08:00
jiaopenglong	4b5bdedff2	feat(monitor): send exception to light monitor (#420 ) * send exception to light monitor * update try_import_send_exception	2023-10-18 21:00:21 +08:00
jiaxingli	30f610b1fa	Test(pp): test pipeline parallel (#413 ) * test: pp * feat: add pp test * test pp * pp test * pp test * test pp	2023-10-18 17:53:08 +08:00
yingtongxiong	a5aeab2a3f	memory profiling test	2023-10-17 19:54:21 +08:00
Wenwen Qu	aa5e34d815	compatible with old ckpt (#418 )	2023-10-17 17:25:36 +08:00
yingtongxiong	16ef7b7889	add test	2023-10-17 17:16:39 +08:00
yingtongxiong	5abe519c4c	remove full weight for block 0	2023-10-17 16:37:06 +08:00
yingtongxiong	5c38cb6409	add head overlap	2023-10-17 15:38:24 +08:00
yingtongxiong	a5c6e457b9	Merge branch 'feat/fstp' of https://github.com/yingtongxiong/InternLM into feat/fstp	2023-10-17 15:17:03 +08:00
yingtongxiong	6408b944c2	support fine grained	2023-10-17 15:14:39 +08:00
chenxun.p	b51cf4ebc3	Merge branch 'feat/fstp' of github.com:yingtongxiong/InternLM into feat/fstp	2023-10-17 15:10:27 +08:00
chenxun.p	6682f5d92a	fix reduce scatter async bug	2023-10-17 15:10:07 +08:00

1 2 3 4 5 ...

281 Commits (3253cbf48ef23c7e67e340533c16e1a372579f8e) All Branches Search

281 Commits (3253cbf48ef23c7e67e340533c16e1a372579f8e)

All Branches