ColossalAI

Commit Graph

Author	SHA1	Message	Date
Frank Lee	15aab1476e	[zero] avoid zero hook spam by changing log to debug level (#1137 )	2 years ago
Frank Lee	73ad05fc8c	[zero] added error message to handle on-the-fly import of torch Module class (#1135 ) * [zero] added error message to handle on-the-fly import of torch Module class * polish code	2 years ago
ver217	e4f555f29a	[optim] refactor fused sgd (#1134 )	2 years ago
ver217	d26902645e	[ddp] add save/load state dict for ColoDDP (#1127 ) * add save/load state dict for ColoDDP * add unit test * refactor unit test folder * polish unit test * rename unit test	2 years ago
YuliangLiu0306	946dbd629d	[hotfix]fix bugs caused by refactored pipeline (#1133 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [hotfix]fix bugs caused by refactored pipeline	3 years ago
ver217	789cad301b	[hotfix] fix param op hook (#1131 ) * fix param op hook * update zero tp test * fix bugs	3 years ago
ver217	a1a7899cae	[hotfix] fix zero init ctx numel (#1128 )	3 years ago
ver217	f0a954f16d	[ddp] add set_params_to_ignore for ColoDDP (#1122 ) * add set_params_to_ignore for ColoDDP * polish code * fix zero hook v2 * add unit test * polish docstr	3 years ago
YuliangLiu0306	3175bcb4d8	[pipeline]support List of Dict data (#1125 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [pipeline]support List of Dict data * polish	3 years ago
Frank Lee	91a5999825	[ddp] supported customized torch ddp configuration (#1123 )	3 years ago
YuliangLiu0306	fcf55777dd	[fx]add autoparallel passes (#1121 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * feature/add autoparallel passes	3 years ago
ver217	e127b4375b	cast colo ddp v2 inputs/outputs (#1120 )	3 years ago
Frank Lee	16302a5359	[fx] added unit test for coloproxy (#1119 ) * [fx] added unit test for coloproxy * polish code * polish code	3 years ago
ver217	7d14b473f0	[gemini] gemini mgr supports "cpu" placement policy (#1118 ) * update gemini mgr * update chunk * add docstr * polish placement policy * update test chunk * update test zero * polish unit test * remove useless unit test	3 years ago
ver217	f99f56dff4	fix colo parameter torch function (#1117 )	3 years ago
Frank Lee	e1620ddac2	[fx] added coloproxy (#1115 )	3 years ago
Frank Lee	6f82ac9bcb	[pipeline] supported more flexible dataflow control for pipeline parallel training (#1108 ) * [pipeline] supported more flexible dataflow control for pipeline parallel training * polish code * polish code * polish code	3 years ago
Frank Lee	53297330c0	[test] fixed hybrid parallel test case on 8 GPUs (#1106 )	3 years ago
github-actions[bot]	85b58093d2	Automated submodule synchronization (#1105 ) Co-authored-by: github-actions <github-actions@github.com>	3 years ago
Frank Lee	74948b095c	[release] update version.txt (#1103 )	3 years ago
ver217	895c1c5ee7	[tensor] refactor param op hook (#1097 ) * refactor param op hook * add docstr * fix bug	3 years ago
YuliangLiu0306	1e9f9c227f	[hotfix]change to fit latest p2p (#1100 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [hotfix]change to fit latest p2p * polish * polish	3 years ago
Frank Lee	72bd7c696b	[amp] included dict for type casting of model output (#1102 )	3 years ago
Frank Lee	5a9d8ef4d5	[workflow] fixed 8-gpu test workflow (#1101 )	3 years ago
Frank Lee	03e52ecba3	[workflow] added regular 8 GPU testing (#1099 ) * [workflow] added regular 8 GPU testing * polish workflow	3 years ago
Frank Lee	7f2d2b2b5b	[engine] fixed empty op hook check (#1096 ) * [engine] fixed empty op hook check * polish code	3 years ago
Frank Lee	14e5b11d7f	[zero] fixed api consistency (#1098 )	3 years ago
Frank Lee	cb18922c47	[doc] added documentation to chunk and chunk manager (#1094 ) * [doc] added documentation to chunk and chunk manager * polish code * polish code * polish code	3 years ago
ver217	1f894e033f	[gemini] zero supports gemini (#1093 ) * add placement policy * add gemini mgr * update mem stats collector * update zero * update zero optim * fix bugs * zero optim monitor os * polish unit test * polish unit test * add assert	3 years ago
Frank Lee	2b2dc1c86b	[pipeline] refactor the pipeline module (#1087 ) * [pipeline] refactor the pipeline module * polish code	3 years ago
Frank Lee	bad5d4c0a1	[context] support lazy init of module (#1088 ) * [context] support lazy init of module * polish code	3 years ago
ver217	be01db37c8	[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077 ) * polish chunk manager * polish unit test * impl add_extern_static_tensor for chunk mgr * add mem stats collector v2 * polish code * polish unit test * polish code * polish get chunks	3 years ago
Ziyue Jiang	b3a03e4bfd	[Tensor] fix equal assert (#1091 ) * fix equal assert * polish	3 years ago
Frank Lee	50ec3a7e06	[test] skip tests when not enough GPUs are detected (#1090 ) * [test] skip tests when not enough GPUs are detected * polish code * polish code	3 years ago
github-actions[bot]	3a7571b1d7	Automated submodule synchronization (#1081 ) Co-authored-by: github-actions <github-actions@github.com>	3 years ago
Frank Lee	1bd8a72fc9	[workflow] disable p2p via shared memory on non-nvlink machine (#1086 )	3 years ago
Frank Lee	65ee6dcc20	[test] ignore 8 gpu test (#1080 ) * [test] ignore 8 gpu test * polish code * polish workflow * polish workflow	3 years ago
Ziyue Jiang	0653c63eaa	[Tensor] 1d row embedding (#1075 ) * Add CPU 1d row embedding * polish	3 years ago
junxu	d66ffb4df4	Remove duplication registry (#1078 )	3 years ago
Jiarui Fang	bcab249565	fix issue #1080 (#1071 )	3 years ago
ver217	1b17859328	[tensor] chunk manager monitor mem usage (#1076 )	3 years ago
ver217	98cdbf49c6	[hotfix] fix chunk comm src rank (#1072 )	3 years ago
Frank Lee	bfdc5ccb7b	[context] maintain the context object in with statement (#1073 )	3 years ago
ver217	c5cd3b0f35	[zero] zero optim copy chunk rather than copy tensor (#1070 )	3 years ago
Ziyue Jiang	4fc748f69b	[Tensor] fix optimizer for CPU parallel (#1069 )	3 years ago
Jiarui Fang	49832b2344	[refactory] add nn.parallel module (#1068 )	3 years ago
Ziyue Jiang	6754f1b77f	fix module utils bug (#1066 )	3 years ago
Jiarui Fang	a00644079e	reorgnize colotensor directory (#1062 ) * reorgnize colotensor directory * polish code	3 years ago
Frank Lee	3d10be33bd	[cudnn] set False to cudnn benchmark by default (#1063 )	3 years ago
Ziyue Jiang	df9dcbbff6	[Tensor] add hybrid device demo and fix bugs (#1059 )	3 years ago

... 11 12 13 14 15 ...

1312 Commits (ca6e75bc2864dbd72b3d370d5046a8eda9cc414c) All Branches Search

1312 Commits (ca6e75bc2864dbd72b3d370d5046a8eda9cc414c)

All Branches