3 Commits (main)

Author SHA1 Message Date
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
hxwang 70c9924d0d [chore] solve moe ckpt test failure and some other arg pass failure 4 months ago
hxwang 74eccac0db [moe] test deepseek 4 months ago
botbw 9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 4 months ago
botbw 13b48ac0aa [zero] solve hang 4 months ago
hxwang 46c069b0db [zero] solve hang 4 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821) 5 months ago