ColossalAI/colossalai/engine/gradient_handler
HELSON e6d50ec107
[zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero

* add unitest for moe-zero model init

* polish moe gradient handler
2022-03-31 18:34:11 +08:00
..
__init__.py moved env variables to global variables; (#215) 2022-02-15 11:31:13 +08:00
_base_gradient_handler.py Refactored docstring to google style 2022-03-29 17:17:47 +08:00
_data_parallel_gradient_handler.py add moe context, moe utilities and refactor gradient handler (#455) 2022-03-18 16:38:32 +08:00
_moe_gradient_handler.py [zero] adapt zero for unsharded parameters (#561) 2022-03-31 18:34:11 +08:00
_pipeline_parallel_gradient_handler.py [zero] ZeRO supports pipeline parallel (#477) 2022-03-21 16:55:37 +08:00
_sequence_parallel_gradient_handler.py add moe context, moe utilities and refactor gradient handler (#455) 2022-03-18 16:38:32 +08:00
_zero_gradient_handler.py Flake8 code restyle 2022-03-11 15:50:28 +08:00
utils.py add moe context, moe utilities and refactor gradient handler (#455) 2022-03-18 16:38:32 +08:00