17 Commits (162251ab7844e4116a36d6e0fec2ac7ccd03f74d)

Author SHA1 Message Date
botbw c54c4fcd15
[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) 2 months ago
wangbluo 1f703e0ef4 fix 3 months ago
wangbluo 2eb36839c6 fix 3 months ago
flybird11111 f1a3a326c4
[fp8]Moe support fp8 communication (#5977) 4 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
hxwang 74b03de3f9 [moe] remove ops 4 months ago
hxwang 803878b2fd [moe] full test for deepseek and mixtral (pp + sp to fix) 4 months ago
hxwang 3e2b6132b7 [moe] clean legacy code 4 months ago
botbw dc583aa576 [moe] implement tp 4 months ago
botbw 9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 4 months ago
botbw b5bfeb2efd [moe] implement transit between non moe tp and ep 4 months ago
hxwang 46c069b0db [zero] solve hang 4 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821) 5 months ago
digger yu 5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) 9 months ago
ver217 06db94fbc9 [moe] fix tests 10 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309) 10 months ago
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253) 10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 10 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298) 10 months ago
digger yu bce9499ed3
fix some typo (#5307) 10 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) 1 year ago
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) 1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago