Commit Graph

1 Commits (0b5bbe9ce456a17cea00b46ea0255a308a02ecba)

Author SHA1 Message Date
botbw 9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 2024-08-01 10:06:59 +08:00