Commit Graph

1 Commits (d4a64e355e4782b8c31d53fbcf0331289856f89c)

Author SHA1 Message Date
botbw 1b15cc97f5
[moe] add mixtral dp grad scaling when not all experts are activated 2024-07-19 07:30:14 +00:00