Commit Graph

2 Commits (3e2b6132b7c8543324685e527a645f6f33962f38)

Author SHA1 Message Date
hxwang 74eccac0db [moe] test deepseek 2024-08-01 10:06:59 +08:00
botbw 9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 2024-08-01 10:06:59 +08:00