ColossalAI/colossalai/zero/gemini
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869)
* add test

* fix no_sync bug in low level zero plugin

* fix test

* add argument for grad accum

* add grad accum in backward hook for gemini

* finish implementation, rewrite tests

* fix test

* skip stuck model in low level zero test

* update doc

* optimize communication & fix gradient checkpoint

* modify doc

* cleaning codes

* update cpu adam fp16 case
2023-10-17 14:07:21 +08:00
..
chunk [gemini] support gradient accumulation (#4869) 2023-10-17 14:07:21 +08:00
memory_tracer [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
__init__.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
colo_init_context.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
gemini_ddp.py [gemini] support gradient accumulation (#4869) 2023-10-17 14:07:21 +08:00
gemini_hook.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
gemini_mgr.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
gemini_optimizer.py [gemini] support gradient accumulation (#4869) 2023-10-17 14:07:21 +08:00
placement_policy.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
utils.py [gemini] support amp o3 for gemini (#4872) 2023-10-12 10:39:08 +08:00