ColossalAI

Commit Graph

Author	SHA1	Message	Date
binmakeswell	cbaa104216	release FP8 news (#6068 ) * add FP8 news * release FP8 news * release FP8 news	2024-09-25 11:57:16 +08:00
duanjunwen	8501202a35	Merge pull request #6065 from duanjunwen/dev/zero_bubble [Feat] Support zero bubble with shardformer input	2024-09-24 19:17:37 +08:00
duanjunwen	7e6f793c51	[fix] fix detach_output_obj clone;	2024-09-24 08:08:32 +00:00
duanjunwen	6c1e1550ae	[fix] fix dumb clone;	2024-09-23 06:43:49 +00:00
duanjunwen	a875212a42	[fix] fix ci --> oom in 4096 hidden dim;	2024-09-23 05:55:16 +00:00
duanjunwen	c114d1429a	[fix] fix detach clone release order;	2024-09-23 04:00:24 +00:00
duanjunwen	da3220f48c	[fix] fix pipeline util func deallocate --> release_tensor_data; fix bwd_b loss bwd branch;	2024-09-20 09:48:35 +00:00
duanjunwen	1739df423c	[fix] fix fwd branch, fwd pass both micro_batch & internal_inputs'	2024-09-20 07:34:43 +00:00
duanjunwen	b6616f544e	[fix] rm comments;	2024-09-20 07:29:41 +00:00
duanjunwen	c6d6ee39bd	[fix] use tree_flatten replace dict traverse;	2024-09-20 07:18:49 +00:00
duanjunwen	26783776f1	[fix] fix input_tensors buffer append input_obj(dict) --> Tuple (microbatch, input_obj) , and all bwd b related cal logic;	2024-09-20 06:41:19 +00:00
duanjunwen	4753bf7add	[fix] fix mem assert;	2024-09-19 08:27:47 +00:00
duanjunwen	a115106f8d	[fix] fix bwd w input;	2024-09-19 08:10:05 +00:00
duanjunwen	349272c71f	[fix] updatw bwd b&w input; dict --> list[torch.Tensor]	2024-09-19 07:47:01 +00:00
duanjunwen	6ee9584b9a	[fix] fix require_grad & deallocate call;	2024-09-19 05:53:03 +00:00
duanjunwen	1f5c7258aa	Merge remote-tracking branch 'upstream/feature/zerobubble' into dev/zero_bubble	2024-09-19 03:52:13 +00:00
Hongxin Liu	dabc2e7430	[release] update version (#6062 )	2024-09-19 10:45:32 +08:00
Camille Zhong	f9546ba0be	[ColossalEval] support for vllm (#6056 ) * support vllm * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify vllm and update readme * run pre-commit * remove dupilicated lines and refine code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update param name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine code * update readme * refine code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-18 17:09:45 +08:00
duanjunwen	af2c2f8092	[feat] add more test;	2024-09-18 07:51:54 +00:00
duanjunwen	3dbad102cf	[fix] fix zerobubble pp for shardformer type input;	2024-09-18 07:14:34 +00:00
botbw	4fa6b9509c	[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063 )	2024-09-18 10:09:01 +08:00
Wang Binluo	63314ce4e4	Merge pull request #6064 from wangbluo/fix_attn [sp] : fix the attention kernel for sp	2024-09-18 10:08:15 +08:00
wangbluo	10e4f7da72	fix	2024-09-16 13:45:04 +08:00
Wang Binluo	37e35230ff	Merge pull request #6061 from wangbluo/sp_fix [sp] : fix the attention kernel for sp	2024-09-14 20:54:35 +08:00
wangbluo	827ef3ee9a	fix	2024-09-14 10:40:35 +00:00
Guangyao Zhang	bdb125f83f	[doc] FP8 training and communication document (#6050 ) * Add FP8 training and communication document * add fp8 docstring for plugins * fix typo * fix typo	2024-09-14 11:01:05 +08:00
Guangyao Zhang	f20b066c59	[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059 ) * all_gather only internode, fix pytest * fix cuda arch <89 compile pytest error * fix pytest failure * disable all_gather_into_tensor_flat_fp8 * fix fp8 format * fix pytest * fix conversations * fix chunk tuple to list	2024-09-14 10:40:01 +08:00
wangbluo	b582319273	fix	2024-09-13 10:24:41 +00:00
wangbluo	0ad3129cb9	fix	2024-09-13 09:01:26 +00:00
wangbluo	0b14a5512e	fix	2024-09-13 07:06:14 +00:00
botbw	696fced0d7	[fp8] fix missing fp8_comm flag in mixtral (#6057 )	2024-09-13 14:30:05 +08:00
wangbluo	dc032172c3	fix	2024-09-13 06:00:58 +00:00
wangbluo	f393867cff	fix	2024-09-13 05:24:52 +00:00
wangbluo	6eb8832366	fix	2024-09-13 05:06:56 +00:00
wangbluo	683179cefd	fix	2024-09-13 03:40:56 +00:00
wangbluo	0a01e2a453	fix the attn	2024-09-13 03:38:35 +00:00
pre-commit-ci[bot]	216d54e374	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2024-09-13 02:38:40 +00:00
wangbluo	fdd84b9087	fix the sp	2024-09-13 02:32:03 +00:00
duanjunwen	9bc3b6e220	[feat] moehybrid support zerobubble;	2024-09-12 02:51:46 +00:00
flybird11111	a35a078f08	[doc] update sp doc (#6055 ) * update sp doc * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-11 17:25:14 +08:00
Hongxin Liu	13946c4448	[fp8] hotfix backward hook (#6053 ) * [fp8] hotfix backward hook * [fp8] hotfix pipeline loss accumulation	2024-09-11 16:11:25 +08:00
duanjunwen	11ae6848c6	[zerobubble]Support ZeroBubble Pipeline (#6034 ) * [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble; * [feat] add dw test; * [fix] fix weight not close; * [update] update text; * [feat] add test run_fwd_bwd automatic scheduling; * [feat] split communication and calculation; fix pop empty send_bwd_buffer error; * [feat] add test for p & p grad; * [feat] add comments for ZBV func; * [fix] rm useless assign and comments; * [fix] fix ci test; add pytest; * [feat] add run_fwd_bwd_with_microbatch (replace input) & test; add p&p.grad assert close test & all pass; * [feat] add apply v_schedule graph; p & p.grad assert err exist; * [fix] update * [feat] fix ci; add assert; * [feat] fix poc format * [feat] fix func name & ci; add comments; * [fix] fix poc test; add comments in poc; * [feat] add optim backward_b_by_grad * [feat] fix optimizer bwd b & w; support return accum loss & output * [feat] add fwd_bwd_step, run_fwd_only; * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict; * [fix] fix communication_map; * [feat] update test; rm comments; * [fix] rm zbv in hybridplugin * [fix] fix optim bwd; * [fix] fix optim bwd; * [fix] rm output.data after send fwd; * [fix] fix bwd step if condition; remove useless comments and format info; * [fix] fix detach output & release output; * [fix] rm requir_grad for output; * [fix] fix requir grad position and detach position and input&output local buffer append position; * [feat] add memory assertation; * [fix] fix mem check; * [fix] mem assertation' * [fix] fix mem assertation * [fix] fix mem; use a new model shape; only assert mem less and equal than theo; * [fix] fix model zoo import; * [fix] fix redundant detach & clone; add buffer assertation in the end; * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap; * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim; * [fix] add testcase with microbatch 4;	2024-09-10 17:33:09 +08:00
botbw	c54c4fcd15	[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048 ) * [example] pass use_fp8_comm flag to all plugins * [example] add mixtral benchmark * [moe] refine assertion and check * [moe] fix mixtral & add more tests * [moe] consider checking dp * sp group and moe_dp_group * [mixtral] remove gate tp & add more tests * [deepseek] fix tp & sp for deepseek * [mixtral] minor fix * [deepseek] add deepseek benchmark	2024-09-10 17:30:53 +08:00
Wenxuan Tan	8fd25d6e09	[Feature] Split cross-entropy computation in SP (#5959 ) * halfway * fix cross-PP-stage position id length diff bug * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * adapt chatglm, command-R, qwen * debug * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * add comments * q1 index only once * remove events to simplify stream sync * simplify forward/backward logic * 2d ring forward passed * 2d ring backward passed * fixes * fix ring attn loss * 2D ring backward + llama passed * merge * update logger * fix typo * rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * remove typos * fixes * support GPT --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-10 12:06:50 +08:00
Hongxin Liu	b3db1058ec	[release] update version (#6041 ) * [release] update version * [devops] update comp test * [devops] update comp test debug * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test	2024-09-10 10:31:09 +08:00
duanjunwen	6c2a120bed	[fix] add testcase with microbatch 4;	2024-09-09 10:16:03 +00:00
duanjunwen	8366a7855f	[fix] update optim state dict assert (include param group & state); fix mem assert after add optim;	2024-09-09 09:27:13 +00:00
duanjunwen	ce58d8e8bf	[fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap;	2024-09-09 08:19:58 +00:00
duanjunwen	7568b34626	[fix] fix redundant detach & clone; add buffer assertation in the end;	2024-09-09 08:04:28 +00:00
duanjunwen	fed8b1587d	[fix] fix model zoo import;	2024-09-09 06:39:33 +00:00

1 2 3 4 5 ...

3861 Commits (0d6d40ccc62b5eaa514c7f4f8cc525ce159ff038) All Branches Search

3861 Commits (0d6d40ccc62b5eaa514c7f4f8cc525ce159ff038)

All Branches