Commit Graph

1 Commits (102b784a10f0cd1c740d9ceba343a78166314290)

Author SHA1 Message Date
botbw 9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 2024-08-01 10:06:59 +08:00