ColossalAI/colossalai/moe
botbw 1b15cc97f5
[moe] add mixtral dp grad scaling when not all experts are activated
2024-07-19 07:30:14 +00:00
..
__init__.py [MoE/ZeRO] Moe refactor with zero refactor (#5821) 2024-06-28 14:00:08 +08:00
_operation.py [moe] add mixtral dp grad scaling when not all experts are activated 2024-07-19 07:30:14 +00:00
load_balance.py [MoE/ZeRO] Moe refactor with zero refactor (#5821) 2024-06-28 14:00:08 +08:00
manager.py fix some typo (#5307) 2024-01-25 13:56:27 +08:00
utils.py [MoE/ZeRO] Moe refactor with zero refactor (#5821) 2024-06-28 14:00:08 +08:00