Commit Graph

24 Commits (feature/async-io)

Author SHA1 Message Date
hxwang 5b4c12381b Revert "[moe] implement submesh initialization"
4 months ago
Haze188 404b16faf3 [Feature] MoE Ulysses Support (#5918)
4 months ago
botbw 8dbb86899d [chore] trivial fix
4 months ago
botbw e28e05345b [moe] implement submesh initialization
4 months ago
hxwang 46c069b0db [zero] solve hang
4 months ago
hxwang 0fad23c691 [chore] handle non member group
4 months ago
Gao, Ruiyuan 5fb958cc83
[FIX BUG] convert env param to int in (#5934)
4 months ago
Haze188 3420921101
[shardformer] DeepseekMoE support (#5871)
5 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
5 months ago
Edenzzzz 2a25a2aff7
[Feature] optimize PP overlap (#5735)
5 months ago
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
7 months ago
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
8 months ago
Zhongkai Zhao 8e412a548e
[shardformer] Sequence Parallelism Optimization (#5533)
8 months ago
flybird11111 365671be10
fix-test (#5210)
11 months ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942)
1 year ago
littsk be82b5d4ca
[hotfix] Fix the bug where process groups were not being properly released. (#4940)
1 year ago
Baizhou Zhang a2db75546d
[doc] polish shardformer doc (#4779)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
LuGY a78daf6180
[shardformer] support interleaved pipeline (#4448)
1 year ago
Hongxin Liu 5e1a9d48dd [cluster] add process group mesh (#4039)
1 year ago
digger yu 7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808)
2 years ago
Frank Lee 73d3e4d309
[booster] implemented the torch ddd + resnet example (#3232)
2 years ago
YuliangLiu0306 4d5d8f98a4
[API] implement device mesh manager (#3221)
2 years ago
Frank Lee e3ad88fb48
[booster] implemented the cluster module (#3191)
2 years ago