ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jiarui Fang	909211453b	[Tensor] Add some attributes to ColoTensor (#877 ) * [Tensor] add some function to ColoTensor * torch.allclose * rm torch.add	3 years ago
Jiarui Fang	e43f83aa5c	[Tensor] get named parameters for model using ColoTensors (#874 )	3 years ago
Jiarui Fang	96211c2cc8	[tensor] customized op returns ColoTensor (#875 ) * [tensor] customized op returns ColoTensor * polish * polish code	3 years ago
Ziyue Jiang	26d4ab8b03	[Tensor] Add function to spec and update linear 1Drow and unit tests (#869 )	3 years ago
Jiarui Fang	1190b2c4a4	[tensor] add cross_entrophy_loss (#868 )	3 years ago
HELSON	3107817172	[gemini] add stateful tensor container (#867 )	3 years ago
Jiarui Fang	d01d3b8cb0	colo init context add device attr. (#866 )	3 years ago
Jiarui Fang	126ba573a8	[Tensor] add layer norm Op (#852 )	3 years ago
Frank Lee	1258af71cc	[ci] cache cuda extension (#860 )	3 years ago
Ziyue Jiang	bcc8655021	[Tensor ] Add 1Drow weight reshard by spec (#854 )	3 years ago
Jiarui Fang	62f059251b	[Tensor] init a tp network training unittest (#849 )	3 years ago
Ziyue Jiang	2a0a427e04	[tensor]add assert for colo_tensor 1Drow (#846 )	3 years ago
Ziyue Jiang	05023ecfee	[Tensor] TP Linear 1D row (#843 )	3 years ago
HELSON	e5ea3fdeef	[gemini] add GeminiMemoryManger (#832 ) * refactor StatefulTensor, tensor utilities * add unitest for GeminiMemoryManager	3 years ago
YuliangLiu0306	35ea6e1023	[pipelinable]use pipelinable context to initialize non-pipeline model (#816 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [pipeline]add module lazy init feature to support large model initization. * [pipeline]add to_layer_list and partition method to support arbitrary non-pp model * refactor the module structure * polish * [pipelinable]add unit test for pipelinable * polish * polish * Fix CodeFactor issues.	3 years ago
Jiarui Fang	ea0a2ed25f	[hotfix] the bug of numel() in ColoTensor (#845 )	3 years ago
Jiarui Fang	8789850eea	Init Conext supports lazy allocate model memory (#842 )	3 years ago
Frank Lee	943982d29a	[unittest] refactored unit tests for change in dependency (#838 )	3 years ago
Frank Lee	01e9f834f5	[dependency] removed torchvision (#833 ) * [dependency] removed torchvision * fixed transforms	3 years ago
Jiarui Fang	cb5a4778e1	Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831 )" (#835 ) This reverts commit `ac88de6dfc`.	3 years ago
Jiarui Fang	ac88de6dfc	[WIP] Applying ColoTensor on TP-1D-row Linear. (#831 ) * revert zero tensors back * [tensor] init row 1d linear	3 years ago
Jiarui Fang	294a6060d0	[tensor] ZeRO use ColoTensor as the base class. (#828 ) * [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. * [tensor] ZeRO use ColoTensor as the base class. * polish	3 years ago
Ziyue Jiang	8e6fdb4f29	[tensor]fix test_linear (#826 )	3 years ago
Ziyue Jiang	1a9e2c2dff	[tensor] fix kwargs in colo_tensor torch_funtion (#825 )	3 years ago
Jiarui Fang	2ecc3d7a55	[tensor] lazy init (#823 )	3 years ago
Jiarui Fang	660d2d1f1b	[Tensor] apply ColoTensor on Torch functions (#821 ) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit `88759e289e`. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * [tensor] renaming and reorganize directory structure. * rm useless dir * polish * polish * [tensor] hander the function not wrapped	3 years ago
Jiarui Fang	0ce8924ceb	[tensor] reorganize files (#820 )	3 years ago
Jiarui Fang	ab962b9735	[gemini] a new tensor structure (#818 ) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit `88759e289e`. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	3 years ago
Jiarui Fang	e761ad2cd7	Revert "[zero] add ZeroTensorShardStrategy (#793 )" (#806 )	3 years ago
HELSON	88759e289e	[zero] add ZeroTensorShardStrategy (#793 )	3 years ago
Jiarui Fang	681addb512	[refactor] moving grad acc logic to engine (#804 )	3 years ago
Jiarui Fang	4d9332b4c5	[refactor] moving memtracer to gemini (#801 )	3 years ago
HELSON	4c4388c46e	[hotfix] fix memory leak in zero (#781 )	3 years ago
Frank Lee	5a1a095b92	[test] refactored with the new rerun decorator (#763 ) * [test] refactored with the new rerun decorator * polish test case	3 years ago
Jiarui Fang	10ef8afdd2	[gemini] init genimi individual directory (#754 )	3 years ago
ver217	dcca614eee	[hotfix] fix test_stateful_tensor_mgr (#762 )	3 years ago
ver217	a93a7d7364	[hotfix] fix reuse_fp16_shard of sharded model (#756 ) * fix reuse_fp16_shard * disable test stm * polish code	3 years ago
HELSON	84c6700b2a	[zero] refactor memstats_collector (#746 )	3 years ago
ver217	e396bb71f2	[zero] add tensor placement policies (#743 ) * add tensor placement policies * polish comments * polish comments * update moe unit tests	3 years ago
HELSON	22c4b88d56	[zero] refactor ShardedParamV2 for convenience (#742 )	3 years ago
Frank Lee	f4f42d4c3c	[bug] fixed DDP compatibility with torch 1.8 (#739 )	3 years ago
Jiarui Fang	53cb584808	[utils] correct cpu memory used and capacity in the context of multi-process (#726 )	3 years ago
HELSON	b9b469ea50	[moe] add checkpoint for moe zero test (#729 )	3 years ago
FrankLeeeee	e88a498c9c	[test] removed trivial outdated test	3 years ago
FrankLeeeee	62b4ce7326	[test] added missing decorators to model checkpointing tests	3 years ago
Jiarui Fang	4d90a7b513	[refactor] zero directory (#724 )	3 years ago
Frank Lee	20ab1f5520	[bug] fixed broken test_found_inf (#725 )	3 years ago
Jiarui Fang	193dc8dacb	[refactor] refactor the memory utils (#715 )	3 years ago
HELSON	dbd96fe90a	[zero] check whether gradients have inf and nan in gpu (#712 )	3 years ago
HELSON	a9b8300d54	[zero] improve adaptability for not-shard parameters (#708 ) * adapt post grad hooks for not-shard parameters * adapt optimizer for not-shard parameters * offload gradients for not-replicated parameters	3 years ago

... 11 12 13 14 15 ...

780 Commits (a52f62082de0f4b4544ba2d04e909f74123425ce)