ColossalAI

History

Baizhou Zhang 21ba89cab6 [gemini] support gradient accumulation (#4869 ) * add test * fix no_sync bug in low level zero plugin * fix test * add argument for grad accum * add grad accum in backward hook for gemini * finish implementation, rewrite tests * fix test * skip stuck model in low level zero test * update doc * optimize communication & fix gradient checkpoint * modify doc * cleaning codes * update cpu adam fp16 case		2023-10-17 14:07:21 +08:00
..
test_chunk_mgrv2.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_chunkv2.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_fwd_bwd.py	[gemini] support amp o3 for gemini (#4872 )	2023-10-12 10:39:08 +08:00
test_gemini_use_rmt.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_grad_accum.py	[gemini] support gradient accumulation (#4869 )	2023-10-17 14:07:21 +08:00
test_grad_clip.py	[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921 )	2023-10-16 21:56:53 +08:00
test_inference.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_optim.py	[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921 )	2023-10-16 21:56:53 +08:00
test_runtime_mem_tracer.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_search.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_zeroddp_state_dict.py	[gemini] support amp o3 for gemini (#4872 )	2023-10-12 10:39:08 +08:00
test_zerooptim_state_dict.py	[hotfix] fix lr scheduler bug in torch 2.0 (#4864 )	2023-10-12 14:04:24 +08:00