ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jiarui Fang	af32022f74	[Gemini] fix the convert_to_torch_module bug (#2269 )	2 years ago
Super Daniel	b0d21d0c4f	[autockpt] linearize / merge shape-consistency nodes. (#2271 ) * [autockpt] make it work. * [autockpt] linearize / merge shape-consistency nodes.	2 years ago
YuliangLiu0306	4b29112ab2	[autoparallel] gpt2 autoparallel examples (#2267 ) * [autoparallel] gpt2 autoparallel examples * polish code * polish code	2 years ago
Ziyue Jiang	8b045b3c1f	[Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232 ) * move to cpu to avoid dead lock * get output by offsets Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Boyuan Yao	5c2ef9fc76	[autoparallel] modify comm nodes' memory cost in construct chain (#2263 ) * [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline * [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop * [autoparallel] specifycomm nodes' memory cost in construct chain	2 years ago
Boyuan Yao	1ea99b869e	[autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline (#2261 )	2 years ago
Super Daniel	3ccf58aa76	[autockpt] make it work. (#2257 )	2 years ago
Boyuan Yao	ac3739930d	[autoparallel] modify construct chain in rotor solver (#2254 )	2 years ago
Boyuan Yao	ab38aebace	[autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint (#2248 ) * [autoparallel] hook node meta on graph nodes for checkpoint solver * [autoparallel] polish code * [autoparallel] restore some node handlers * colossalai/auto_parallel/passes/meta_info_prop.py * [autoparallel] remove some unused import * [autoparallel] hook bwd_mem_out	2 years ago
Boyuan Yao	c8c79102f0	[autoparallel] patch torch.flatten metainfo for autoparallel (#2247 ) * [autoparallel] patch torch.flatten	2 years ago
YuliangLiu0306	8897b8f753	[autoparallel] autoparallel initialize (#2238 )	2 years ago
xcnick	85178a397a	[hotfix] fix error for torch 2.0 (#2243 )	2 years ago
Super Daniel	b7d0990c61	[autoparallel] fix construct meta info. (#2245 )	2 years ago
Ziyue Jiang	57929a6210	fix type of num_worker_threads (#2237 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Jiarui Fang	db4cbdc7fb	[builder] builder for scaled_upper_triang_masked_softmax (#2234 )	2 years ago
Super Daniel	78483a9fdd	[logger] hotfix, missing _FORMAT (#2231 )	2 years ago
Jiarui Fang	54de05da5d	[builder] polish builder with better base class (#2216 ) * [builder] polish builder * remove print	2 years ago
YuliangLiu0306	3b1b91eaf4	[autoparallel] record parameter attribute in colotracer (#2217 ) * [autoparallel] record parameter attribute in collotracer * [autoparallel] fix construct_meta_info bug	2 years ago
Jiarui Fang	7675792100	[builder] raise Error when CUDA_HOME is not set (#2213 )	2 years ago
Jiarui Fang	d5e3e3ec01	[example] update gpt example for larger model scale (#2211 )	2 years ago
Boyuan Yao	24246f7aa5	[autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162 ) * [fx] metainfo class for auto parallel * [fx] add unit test for linear metainfo * [fx] fix bwd param for linear * [fx] modify unit test * [fx] modify unit test * [fx] modify import * [fx] modify import * [fx] modify import * [fx] move meta profiler to auto parallel * [fx] add conv metainfo class * [fx] restore profiler * [fx] restore meta profiler * [autoparallel] modify unit test * [fx] modify unit test * [autoparallel] add batchnorm metainfo class * [autoparallel] fix batchnorm unit test function declaration * [fx] restore profiler * [fx] add relu metainfo class * [fx] restore profiler * [autoparallel] modify metainfo input * [autoparallel] add pooling metainfo * [autoparallel] add F.linear metainfo generator * [autoparallel] add binary elementwise metainfo * [fx] recover profiler * [autoparallel] fix forward memory calculation * [autoparallel] modify constants.py * [autoparallel] remove redundant print * [autoparallel] add F.conv metainfo * [autoparallel] linear fix * [autoparallel] memory estimation for communication actions * [autoparallel] fix docstring * [autoparallel] fix variables name * [autoparallel] attach tensor to metainfo class * [autoparallel] fix dangerous try except * [autoparallel] attach memory cost to shape consistency node * [autoparallel] attach shape consistency node's metainfo to the node * [autoparallel] remove todo in shape consistency memory estimation * [autoparallel] fix the annotation	2 years ago
Boyuan Yao	d0bc5a1b34	[autoparallel] new metainfoprop based on metainfo class (#2179 ) * [autoparallel] new metainfoprop to combine SPMD solver and checkpoint solver * [autoparallel] new metainfoprop to combine SPMD solver and checkpoint solver * [autoparallel] modify placeholder handler * [autoparallel] modify metainfoprop * [autoparallel] fix function typo * [autoparallel] fix placeholder handler	2 years ago
YuliangLiu0306	78509124d3	[autoparallel] update getitem handler (#2207 )	2 years ago
Jiarui Fang	1cb532ffec	[builder] multihead attn runtime building (#2203 ) * [hotfix] correcnt cpu_optim runtime compilation * [builder] multihead attn * fix bug * fix a bug	2 years ago
Tongping Liu	8e22c38b89	[hotfix] Fixing the bug related to ipv6 support Co-authored-by: ByteDance <tongping.liu@bytedance.com>	2 years ago
YuliangLiu0306	4851f2d607	[autoparallel] update_getattr_handler (#2193 )	2 years ago
Jiarui Fang	5682e6d346	[hotfix] correcnt cpu_optim runtime compilation (#2197 )	2 years ago
HELSON	2458659919	[zero] fix error for BEiT models (#2169 ) * [zero] fix error for BEiT models * [ColoParameter] add unpack operation for tuple arguments * fix bugs * fix chunkv2 unit testing * add assertion for gradient state	2 years ago
Jiarui Fang	355ffb386e	[builder] unified cpu_optim fused_optim inferface (#2190 )	2 years ago
Jiarui Fang	9587b080ba	[builder] use runtime builder for fused_optim (#2189 )	2 years ago
Jiarui Fang	bc0e271e71	[buider] use builder() for cpu adam and fused optim in setup.py (#2187 )	2 years ago
Jiarui Fang	d42afd30f8	[builder] runtime adam and fused_optim builder (#2184 )	2 years ago
YuliangLiu0306	550f8f8905	[autoparallel] integrate_gpt_related_tests (#2134 ) * [autoparallel] integrate_gpt_related_tests * polish code * polish code * add GPT2Model into runtime test	2 years ago
Ziyue Jiang	59e343328d	[Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage (#2156 ) * add splitter * polish code * remove comment * fix async nan by moving to cpu first Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Tongping Liu	ab54fed292	[hotfix] add kwargs for colo_addmm (#2171 )	2 years ago
アマデウス	622f863291	[hotfix] Jit type hint #2161 (#2164 )	2 years ago
Zihao	12e7bcd720	register meta func for rnn (#2159 )	2 years ago
Boyuan Yao	cfe2a9bd90	[autoparallel] memory estimation for shape consistency (#2144 ) * [fx] metainfo class for auto parallel * [fx] add unit test for linear metainfo * [fx] fix bwd param for linear * [fx] modify unit test * [fx] modify unit test * [fx] modify import * [fx] modify import * [fx] modify import * [fx] move meta profiler to auto parallel * [fx] add conv metainfo class * [fx] restore profiler * [fx] restore meta profiler * [autoparallel] modify unit test * [fx] modify unit test * [autoparallel] add batchnorm metainfo class * [autoparallel] fix batchnorm unit test function declaration * [fx] restore profiler * [fx] add relu metainfo class * [fx] restore profiler * [autoparallel] modify metainfo input * [autoparallel] add pooling metainfo * [autoparallel] add F.linear metainfo generator * [autoparallel] add binary elementwise metainfo * [fx] recover profiler * [autoparallel] fix forward memory calculation * [autoparallel] modify constants.py * [autoparallel] remove redundant print * [autoparallel] add F.conv metainfo * [autoparallel] linear fix * [autoparallel] memory estimation for communication actions * [autoparallel] fix docstring * [autoparallel] fix variables name	2 years ago
Jiarui Fang	b87496a66b	[hotfix] fix auto policy of test_sharded_optim_v2 (#2157 )	2 years ago
YuliangLiu0306	16335cb537	[hotfix] fix aten default bug (#2158 )	2 years ago
HELSON	a7d95b7024	[example] add zero1, zero2 example in GPT examples (#2146 ) * [example] add zero1 and zero2 for GPT * update readme in gpt example * polish code * change init value * update readme	2 years ago
YuliangLiu0306	1cce6e36ca	[autoparallel] use metainfo in handler (#2149 )	2 years ago
Jiarui Fang	2827f41898	[Gemini] GeminiDPP convert to PyTorch Module. (#2151 )	2 years ago
Jiarui Fang	bdef9dfdbe	[NFC] remove useless graph node code (#2150 )	2 years ago
BlueRum	b3f73ce1c8	[Gemini] Update coloinit_ctx to support meta_tensor (#2147 )	2 years ago
Zihao	a128eec9d5	register aten._convolution.default (#2137 )	2 years ago
Jiarui Fang	ee287620f0	[Gemini] revert ZeROInitCtx related tracer (#2138 )	2 years ago
アマデウス	077a66dd81	updated attention kernel (#2133 )	2 years ago
YuliangLiu0306	a3c6924deb	[autoparallel] process size nodes in runtime pass (#2130 ) * [autoparallel] process size nodes in runtime pass * polish code	2 years ago
YuliangLiu0306	536560ccc0	[autoparallel] implement softmax handler (#2132 )	2 years ago

... 3 4 5 6 7 ...

1310 Commits (258b43317c4a5cafb8d3da0ff63c8843443bc448)