ColossalAI

History

Wenxuan Tan 8fd25d6e09 [Feature] Split cross-entropy computation in SP (#5959 ) * halfway * fix cross-PP-stage position id length diff bug * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * adapt chatglm, command-R, qwen * debug * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * add comments * q1 index only once * remove events to simplify stream sync * simplify forward/backward logic * 2d ring forward passed * 2d ring backward passed * fixes * fix ring attn loss * 2D ring backward + llama passed * merge * update logger * fix typo * rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * remove typos * fixes * support GPT --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2024-09-10 12:06:50 +08:00
..
_C	Clean up	2024-06-07 09:09:29 +00:00
_analyzer	[test] Fix/fix testcase (#5770 )	2024-06-03 15:26:01 +08:00
accelerator	[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335 )	2024-03-05 21:52:30 +08:00
amp	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00
auto_parallel	[pre-commit.ci] pre-commit autoupdate (#5572 )	2024-07-01 17:16:41 +08:00
autochunk	[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606 )	2024-04-18 18:15:50 +08:00
booster	[fp8] fix linear hook (#6046 )	2024-09-03 16:37:16 +08:00
checkpoint_io	[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020 )	2024-09-02 16:56:35 +08:00
cli	[devops] fix extention building (#5427 )	2024-03-05 15:35:54 +08:00
cluster	[FP8] rebase main (#5963 )	2024-08-06 16:29:37 +08:00
context	[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625 )	2024-04-25 14:45:52 +08:00
device	[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694 )	2024-05-14 13:52:45 +08:00
fx	[test] Fix/fix testcase (#5770 )	2024-06-03 15:26:01 +08:00
inference	[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020 )	2024-09-02 16:56:35 +08:00
interface	[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694 )	2024-05-14 13:52:45 +08:00
kernel	[NFC] Fix code factors on inference triton kernels (#5743 )	2024-05-21 22:12:15 +08:00
lazy	[Feature] Zigzag Ring attention (#5905 )	2024-08-16 13:56:38 +08:00
legacy	[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016 )	2024-08-22 09:21:34 +08:00
logging	[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016 )	2024-08-22 09:21:34 +08:00
moe	[fp8]Moe support fp8 communication (#5977 )	2024-08-09 18:26:02 +08:00
nn	[misc] fix dist logger (#5782 )	2024-06-05 15:04:22 +08:00
pipeline	[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016 )	2024-08-22 09:21:34 +08:00
quantization	[fp8] disable all_to_all_fp8 in intranode (#6045 )	2024-09-09 13:47:17 +08:00
shardformer	[Feature] Split cross-entropy computation in SP (#5959 )	2024-09-10 12:06:50 +08:00
tensor	[fp8] support fp8 amp for hybrid parallel plugin (#5975 )	2024-08-07 18:21:08 +08:00
testing	[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016 )	2024-08-22 09:21:34 +08:00
utils	Merge pull request #5310 from hpcaitech/feature/npu	2024-01-29 13:49:39 +08:00
zero	[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016 )	2024-08-22 09:21:34 +08:00
__init__.py	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
initialize.py	[FP8] rebase main (#5963 )	2024-08-06 16:29:37 +08:00