13 Commits (main)

Author SHA1 Message Date
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
hxwang 70c9924d0d [chore] solve moe ckpt test failure and some other arg pass failure 4 months ago
hxwang 74eccac0db [moe] test deepseek 4 months ago
botbw dc583aa576 [moe] implement tp 4 months ago
hxwang 102b784a10 [chore] arg pass & remove drop token 4 months ago
haze188 fe24789eb1 [misc] solve booster hang by rename the variable 4 months ago
botbw 13b48ac0aa [zero] solve hang 4 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821) 5 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) 1 year ago
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) 1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago