ColossalAI/colossalai/nn/layer/moe
HELSON a9b8300d54
[zero] improve adaptability for not-shard parameters (#708)
* adapt post grad hooks for not-shard parameters
* adapt optimizer for not-shard parameters
* offload gradients for not-replicated parameters
2022-04-11 13:38:51 +08:00
..
__init__.py
_operation.py
experts.py [zero] adapt zero for unsharded parameters (#561) 2022-03-31 18:34:11 +08:00
layers.py polish moe docsrting (#618) 2022-04-01 16:15:36 +08:00
utils.py [zero] improve adaptability for not-shard parameters (#708) 2022-04-11 13:38:51 +08:00