ColossalAI

Commit Graph

Author	SHA1	Message	Date
botbw	ff14144d9c	[tmp] add write_tensor	2024-10-23 06:25:53 +00:00
botbw	ad6558e91c	[chore] refactor	2024-10-14 09:41:25 +00:00
botbw	162251ab78	[ckpt] add safetensors util	2024-10-14 08:06:26 +00:00
Tong Li	4c8e85ee0d	[Coati] Train DPO using PP (#6054 ) * update dpo * remove unsupport plugin * update msg * update dpo * remove unsupport plugin * update msg * update template * update dataset * add pp for dpo * update dpo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add dpo fn * update dpo * update dpo * update dpo * update dpo * minor update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update loss * update help * polish code --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-10-11 19:32:00 +08:00
Hongxin Liu	dc2cdaf3e8	[shardformer] optimize seq parallelism (#6086 ) * [shardformer] optimize seq parallelism * [shardformer] fix gpt2 fused linear col * [plugin] update gemini plugin * [plugin] update moe hybrid plugin * [test] update gpt2 fused linear test * [shardformer] fix gpt2 fused linear reduce	2024-10-11 13:44:40 +08:00
梁爽	6b2c506fc5	Update README.md (#6087 ) add HPC-AI.COM activity	2024-10-10 17:02:49 +08:00
Hongxin Liu	646b3c5a90	[shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084 ) * [tp] hotfix linear row * [tp] support uneven split for fused linear * [tp] support sp for fused linear * [tp] fix gpt2 mlp policy * [tp] fix gather fused and add fused linear row	2024-10-10 14:34:45 +08:00
binmakeswell	f4daf04270	add funding news (#6072 ) * add funding news * add funding news * add funding news	2024-09-26 12:29:27 +08:00
binmakeswell	cbaa104216	release FP8 news (#6068 ) * add FP8 news * release FP8 news * release FP8 news	2024-09-25 11:57:16 +08:00
Hongxin Liu	dabc2e7430	[release] update version (#6062 )	2024-09-19 10:45:32 +08:00
Camille Zhong	f9546ba0be	[ColossalEval] support for vllm (#6056 ) * support vllm * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify vllm and update readme * run pre-commit * remove dupilicated lines and refine code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update param name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine code * update readme * refine code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-18 17:09:45 +08:00
botbw	4fa6b9509c	[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063 )	2024-09-18 10:09:01 +08:00
Wang Binluo	63314ce4e4	Merge pull request #6064 from wangbluo/fix_attn [sp] : fix the attention kernel for sp	2024-09-18 10:08:15 +08:00
wangbluo	10e4f7da72	fix	2024-09-16 13:45:04 +08:00
Wang Binluo	37e35230ff	Merge pull request #6061 from wangbluo/sp_fix [sp] : fix the attention kernel for sp	2024-09-14 20:54:35 +08:00
wangbluo	827ef3ee9a	fix	2024-09-14 10:40:35 +00:00
Guangyao Zhang	bdb125f83f	[doc] FP8 training and communication document (#6050 ) * Add FP8 training and communication document * add fp8 docstring for plugins * fix typo * fix typo	2024-09-14 11:01:05 +08:00
Guangyao Zhang	f20b066c59	[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059 ) * all_gather only internode, fix pytest * fix cuda arch <89 compile pytest error * fix pytest failure * disable all_gather_into_tensor_flat_fp8 * fix fp8 format * fix pytest * fix conversations * fix chunk tuple to list	2024-09-14 10:40:01 +08:00
wangbluo	b582319273	fix	2024-09-13 10:24:41 +00:00
wangbluo	0ad3129cb9	fix	2024-09-13 09:01:26 +00:00
wangbluo	0b14a5512e	fix	2024-09-13 07:06:14 +00:00
botbw	696fced0d7	[fp8] fix missing fp8_comm flag in mixtral (#6057 )	2024-09-13 14:30:05 +08:00
wangbluo	dc032172c3	fix	2024-09-13 06:00:58 +00:00
wangbluo	f393867cff	fix	2024-09-13 05:24:52 +00:00
wangbluo	6eb8832366	fix	2024-09-13 05:06:56 +00:00
wangbluo	683179cefd	fix	2024-09-13 03:40:56 +00:00
wangbluo	0a01e2a453	fix the attn	2024-09-13 03:38:35 +00:00
pre-commit-ci[bot]	216d54e374	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2024-09-13 02:38:40 +00:00
wangbluo	fdd84b9087	fix the sp	2024-09-13 02:32:03 +00:00
flybird11111	a35a078f08	[doc] update sp doc (#6055 ) * update sp doc * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-11 17:25:14 +08:00
Hongxin Liu	13946c4448	[fp8] hotfix backward hook (#6053 ) * [fp8] hotfix backward hook * [fp8] hotfix pipeline loss accumulation	2024-09-11 16:11:25 +08:00
botbw	c54c4fcd15	[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048 ) * [example] pass use_fp8_comm flag to all plugins * [example] add mixtral benchmark * [moe] refine assertion and check * [moe] fix mixtral & add more tests * [moe] consider checking dp * sp group and moe_dp_group * [mixtral] remove gate tp & add more tests * [deepseek] fix tp & sp for deepseek * [mixtral] minor fix * [deepseek] add deepseek benchmark	2024-09-10 17:30:53 +08:00
Wenxuan Tan	8fd25d6e09	[Feature] Split cross-entropy computation in SP (#5959 ) * halfway * fix cross-PP-stage position id length diff bug * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * adapt chatglm, command-R, qwen * debug * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * add comments * q1 index only once * remove events to simplify stream sync * simplify forward/backward logic * 2d ring forward passed * 2d ring backward passed * fixes * fix ring attn loss * 2D ring backward + llama passed * merge * update logger * fix typo * rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * remove typos * fixes * support GPT --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-10 12:06:50 +08:00
Hongxin Liu	b3db1058ec	[release] update version (#6041 ) * [release] update version * [devops] update comp test * [devops] update comp test debug * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test	2024-09-10 10:31:09 +08:00
Hanks	5ce6dd75bf	[fp8] disable all_to_all_fp8 in intranode (#6045 ) * enhance all_to_all_fp8 with internode comm control * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable some fp8 ops due to performance issue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-09 13:47:17 +08:00
Hongxin Liu	26e553937b	[fp8] fix linear hook (#6046 )	2024-09-03 16:37:16 +08:00
Hongxin Liu	c3b5caff0e	[fp8] optimize all-gather (#6043 ) * [fp8] optimize all-gather * [fp8] fix all gather fp8 ring * [fp8] enable compile * [fp8] fix all gather fp8 ring	2024-09-03 15:45:17 +08:00
Tong Li	c650a906db	[Hotfix] Remove deprecated install (#6042 ) * remove deprecated install * remove unused folder	2024-09-03 10:33:18 +08:00
Gao, Ruiyuan	e9032fb0b2	[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020 ) * fix bug in load_state_dict_into_model; format error msg * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py to support checking missing_keys * Update general_checkpoint_io.py fix bug in missing_keys error message * retrigger tests --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-09-02 16:56:35 +08:00
Guangyao Zhang	e96a0761ea	[FP8] unsqueeze scale to make it compatible with torch.compile (#6040 )	2024-08-29 14:49:23 +08:00
Tong Li	0d3a85d04f	add fused norm (#6038 )	2024-08-28 17:12:51 +08:00
Tong Li	4a68efb7da	[Colossal-LLaMA] Refactor latest APIs (#6030 ) * refactor latest code * update api * add dummy dataset * update Readme * add setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update files * add PP support * update arguments * update argument * reorg folder * update version * remove IB infor * update utils * update readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update save for zero * update save * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add apex * update --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-08-28 17:01:58 +08:00
Hongxin Liu	cc1b0efc17	[plugin] hotfix zero plugin (#6036 ) * [plugin] hotfix zero plugin * [plugin] hotfix zero plugin	2024-08-28 10:16:48 +08:00
Wenxuan Tan	d383449fc4	[CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018 ) * remove triton version * remove torch 2.2 * remove torch 2.1 * debug * remove 2.1 build tests * require torch >=2.2 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu>	2024-08-27 10:12:21 +08:00
Hongxin Liu	17904cb5bf	Merge pull request #6012 from hpcaitech/feature/fp8_comm [fp8] support fp8 communication and fp8 training for Colossalai	2024-08-27 10:09:43 +08:00
Wang Binluo	4a6f31eb0c	Merge pull request #6033 from wangbluo/fix [fp8] fix the merge	2024-08-26 14:06:06 +08:00
pre-commit-ci[bot]	80d24ae519	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2024-08-26 03:48:43 +00:00
wangbluo	dae39999d7	fix	2024-08-26 03:45:42 +00:00
Wenxuan Tan	7cf9df07bc	[Hotfix] Fix llama fwd replacement bug (#6031 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu>	2024-08-23 15:44:27 +08:00
Wang Binluo	0bf46c54af	Merge pull request #6029 from hpcaitech/flybird11111-patch-1 Update train_dpo.py	2024-08-23 13:50:04 +08:00

1 2 3 4 5 ...

3708 Commits (ckpt) All Branches Search

3708 Commits (ckpt)

All Branches