Commit Graph

13 Commits (2d642eea0f92c7f7c1fb7bef3abdfdb0cb61d1bf)

Author SHA1 Message Date
hxwang 70c9924d0d [chore] solve moe ckpt test failure and some other arg pass failure
4 months ago
hxwang 74eccac0db [moe] test deepseek
4 months ago
botbw dc583aa576 [moe] implement tp
4 months ago
hxwang 102b784a10 [chore] arg pass & remove drop token
4 months ago
haze188 fe24789eb1 [misc] solve booster hang by rename the variable
4 months ago
botbw 13b48ac0aa [zero] solve hang
4 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
5 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
7 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
11 months ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032)
1 year ago
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982)
1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015)
1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
1 year ago