ColossalAI

History

botbw 9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated		2024-08-01 10:06:59 +08:00
..
moe_utils.py	[zero] solve hang	2024-08-01 10:06:59 +08:00
test_deepseek_layer.py	[shardformer] DeepseekMoE support (#5871 )	2024-07-05 16:13:58 +08:00
test_grad_handler.py	[MoE/ZeRO] Moe refactor with zero refactor (#5821 )	2024-06-28 14:00:08 +08:00
test_kernel.py	[MoE/ZeRO] Moe refactor with zero refactor (#5821 )	2024-06-28 14:00:08 +08:00
test_mixtral_layer.py	[MoE/ZeRO] Moe refactor with zero refactor (#5821 )	2024-06-28 14:00:08 +08:00
test_moe_checkpoint.py	[zero] solve hang	2024-08-01 10:06:59 +08:00
test_moe_ep_tp.py	[misc] solve booster hang by rename the variable	2024-08-01 10:06:59 +08:00
test_moe_ep_zero.py	[moe] add mixtral dp grad scaling when not all experts are activated	2024-08-01 10:06:59 +08:00
test_moe_group.py	[MoE/ZeRO] Moe refactor with zero refactor (#5821 )	2024-06-28 14:00:08 +08:00
test_moe_hybrid_zero.py	[MoE/ZeRO] Moe refactor with zero refactor (#5821 )	2024-06-28 14:00:08 +08:00
test_moe_load_balance.py	[MoE/ZeRO] Moe refactor with zero refactor (#5821 )	2024-06-28 14:00:08 +08:00