ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
HELSON	f7f2248771	[moe] fix MoE bugs (#1628 ) * remove forced FP32 modules * correct no_shard-contexts' positions	2 years ago
ver217	8dced41ad0	[zero] zero optim state_dict takes only_rank_0 (#1384 ) * zero optim state_dict takes only_rank_0 * fix unit test	2 years ago
ver217	828b9e5e0d	[hotfix] fix zero optim save/load state dict (#1381 )	2 years ago
HELSON	7a8702c06d	[colotensor] add Tensor.view op and its unit test (#1343 ) [colotensor] add megatron initialization for gpt2	2 years ago
ver217	0c51ff2c13	[hotfix] ZeroDDP use new process group (#1333 ) * process group supports getting ranks in group * chunk mgr receives a process group * update unit test * fix unit tests	2 years ago
ver217	7a05367101	[hotfix] shared model returns cpu state_dict (#1328 )	2 years ago
Jiarui Fang	060b917daf	[refactor] remove gpc dependency in colotensor's _ops (#1189 )	2 years ago
Jiarui Fang	372f791444	[refactor] move chunk and chunkmgr to directory gemini (#1182 )	2 years ago
ver217	9e1daa63d2	[zero] sharded optim supports loading local state dict (#1170 ) * sharded optim supports loading local state dict * polish code * add unit test	2 years ago
ver217	561e90493f	[zero] zero optim supports loading local state dict (#1171 ) * zero optim supports loading local state dict * polish code * add unit test	2 years ago
Frank Lee	65ee6dcc20	[test] ignore 8 gpu test (#1080 ) * [test] ignore 8 gpu test * polish code * polish workflow * polish workflow	2 years ago
HELSON	e5ea3fdeef	[gemini] add GeminiMemoryManger (#832 ) * refactor StatefulTensor, tensor utilities * add unitest for GeminiMemoryManager	3 years ago
Jiarui Fang	e761ad2cd7	Revert "[zero] add ZeroTensorShardStrategy (#793 )" (#806 )	3 years ago
HELSON	88759e289e	[zero] add ZeroTensorShardStrategy (#793 )	3 years ago
Jiarui Fang	4d9332b4c5	[refactor] moving memtracer to gemini (#801 )	3 years ago
HELSON	4c4388c46e	[hotfix] fix memory leak in zero (#781 )	3 years ago
Frank Lee	5a1a095b92	[test] refactored with the new rerun decorator (#763 ) * [test] refactored with the new rerun decorator * polish test case	3 years ago
Jiarui Fang	10ef8afdd2	[gemini] init genimi individual directory (#754 )	3 years ago
ver217	dcca614eee	[hotfix] fix test_stateful_tensor_mgr (#762 )	3 years ago
ver217	a93a7d7364	[hotfix] fix reuse_fp16_shard of sharded model (#756 ) * fix reuse_fp16_shard * disable test stm * polish code	3 years ago
HELSON	84c6700b2a	[zero] refactor memstats_collector (#746 )	3 years ago
ver217	e396bb71f2	[zero] add tensor placement policies (#743 ) * add tensor placement policies * polish comments * polish comments * update moe unit tests	3 years ago
HELSON	22c4b88d56	[zero] refactor ShardedParamV2 for convenience (#742 )	3 years ago
Frank Lee	f4f42d4c3c	[bug] fixed DDP compatibility with torch 1.8 (#739 )	3 years ago
Jiarui Fang	53cb584808	[utils] correct cpu memory used and capacity in the context of multi-process (#726 )	3 years ago

1 2

77 Commits (19e1a5cf16ead982eb8818cd69e41b06a5d23b20)