59 Commits (main)

Author SHA1 Message Date
Hongxin Liu a15ab139ad
[plugin] support get_grad_norm (#6115) 2 weeks ago
Wang Binluo eea37da6fa
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) 3 months ago
Hongxin Liu 26493b97d3
[misc] update compatibility (#6008) 3 months ago
Edenzzzz f5c84af0b0
[Feature] Zigzag Ring attention (#5905) 3 months ago
Hongxin Liu 68359ed1e1
[release] update version (#5752) 6 months ago
flybird11111 77ec773388
[zero]remove registered gradients hooks (#5687) 7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
flybird11111 8954a0c2e2 [LowLevelZero] low level zero support lora (#5153) 7 months ago
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) 8 months ago
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404) 8 months ago
flybird11111 5e16bf7980
[shardformer] fix gathering output when using tensor parallelism (#5431) 8 months ago
Hongxin Liu f2e8b9ef9f
[devops] fix compatibility (#5444) 8 months ago
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295) 9 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351) 10 months ago
Hongxin Liu d7f8db8e21
[hotfix] fix 3d plugin test (#5292) 10 months ago
flybird11111 46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. (#5246) 10 months ago
Frank Lee d69cd2eb89
[workflow] fixed oom tests (#5275) 10 months ago
Frank Lee d5eeeb1416
[ci] fixed booster test (#5251) 11 months ago
Frank Lee edf94a35c3
[workflow] fixed build CI (#5240) 11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) 12 months ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127) 12 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 12 months ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067) 1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043) 1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942) 1 year ago
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
digger yu 9c2feb2f0b
fix some typo with colossalai/device colossalai/tensor/ etc. (#4171) 1 year ago
flybird11111 eedaa3e1ef
[shardformer]fix gpt2 double head (#4663) 1 year ago
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560) 1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479) 1 year ago
flybird1111 906426cb44 [Shardformer] Merge flash attention branch to pipeline branch (#4362) 1 year ago
FoolPlayer c3ca53cf05 [test] skip some not compatible models 1 year ago
Hongxin Liu 411cf1d2db [hotfix] fix gemini and zero test (#4333) 1 year ago
Hongxin Liu 261eab02fb [plugin] add 3d parallel plugin (#4295) 1 year ago
LuGY c6ab96983a [zero] refactor low level zero for shard evenly (#4030) 1 year ago
Frank Lee 58df720570 [shardformer] adapted T5 and LLaMa test to use kit (#4049) 1 year ago
Hongxin Liu dbb32692d2
[lazy] refactor lazy init (#3891) 1 year ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788) 2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775) 2 years ago
wukong1992 6050f37776
[booster] removed models that don't support fsdp (#3744) 2 years ago
Hongxin Liu afb239bbf8
[devops] update torch version of CI (#3725) 2 years ago
wukong1992 b37797ed3d
[booster] support torch fsdp plugin in booster (#3697) 2 years ago
digger-yu 1f73609adb
[CI] fix typo with tests/ etc. (#3727) 2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709) 2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706) 2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684) 2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594) 2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379) 2 years ago