ColossalAI

Commit Graph

Author	SHA1	Message	Date
ver217	ae71036cd2	[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548 ) * refactor parallel layer * broadcast rank0 model after load ckpt	2022-09-06 20:18:35 +08:00
ver217	2bed096848	[utils] optimize partition_tensor_parallel_state_dict (#1546 )	2022-09-06 17:45:31 +08:00
ver217	f5d3a9c2b0	polish checkpoint docstring (#637 )	2022-04-02 13:34:33 +08:00
HELSON	055fbf5be6	[zero] adapt zero for unsharded paramters (Optimizer part) (#601 )	2022-04-01 20:10:47 +08:00
アマデウス	acae68eb04	[model checkpoint] updated checkpoint save/load utils (#592 )	2022-04-01 16:49:21 +08:00
ver217	369a288bf3	polish utils docstring (#620 )	2022-04-01 16:36:47 +08:00
Liang Bowen	ec5086c49c	Refactored docstring to google style	2022-03-29 17:17:47 +08:00
Frank Lee	3a1a9820b0	fixed mkdir conflict and align yapf config with flake (#220 )	2022-02-15 11:31:13 +08:00
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	2022-01-21 10:44:30 +08:00
Frank Lee	3defa32aee	Support TP-compatible Torch AMP and Update trainer API (#27 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com>	2021-11-18 19:45:06 +08:00

10 Commits (fee2af861078181e18a059800ba09a4edfb311f0)