ColossalAI

Commit Graph

Author	SHA1	Message	Date
Liu Ziming	6427c406cf	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/strategy_generator.py code style (#2695 ) Co-authored-by: shenggan <csg19971016@gmail.com>	2 years ago
アマデウス	534f68c83c	[NFC] polish pipeline process group code style (#2694 )	2 years ago
LuGY	56ff1921e9	[NFC] polish colossalai/context/moe_context.py code style (#2693 )	2 years ago
Shawn-Kong	1712da2800	[NFC] polish colossalai/gemini/gemini_context.py code style (#2690 )	2 years ago
HELSON	df4f020ee3	[zero1&2] only append parameters with gradients (#2681 )	2 years ago
ver217	f0aa191f51	[gemini] fix colo_init_context (#2683 )	2 years ago
Boyuan Yao	40c916b192	[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` (#2674 ) * [autoparallel] softmax metainfo * [autoparallel] softmax metainfo	2 years ago
HELSON	8213f89fd2	[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671 )	2 years ago
binmakeswell	9ab14b20b5	[doc] add CVPR tutorial (#2666 )	2 years ago
Boyuan Yao	0385b26ebf	[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647 ) * [autoparallel] layernorm metainfo patch * [autoparallel] polish test	2 years ago
YuliangLiu0306	37df666f38	[autoparallel] refactor handlers which reshape input tensors (#2615 ) * [autoparallel] refactor handlers which reshape input tensors * polish	2 years ago
YuliangLiu0306	28398f1c70	add overlap option (#2613 )	2 years ago
YuliangLiu0306	cb3d1bef62	[autoparallel] adapt autoparallel tests with latest api (#2626 )	2 years ago
Boyuan Yao	90a9fdd91d	[autoparallel] Patch meta information of `torch.matmul` (#2584 ) * [autoparallel] matmul metainfo * [auto_parallel] remove unused print * [tests] skip test_matmul_handler when torch version is lower than 1.12.0	2 years ago
oahzxl	6ba8364881	[autochunk] support diffusion for autochunk (#2621 ) * add alphafold benchmark * renae alphafold test * rename tests * rename diffuser * renme * rename * update transformer * update benchmark * update benchmark * update bench memory * update transformer benchmark * rename * support diffuser * support unet metainfo prop * fix bug and simplify code * update linear and support some op * optimize max region search, support conv * update unet test * support some op * support groupnorm and interpolate * update flow search * add fix dim in node flow * fix utils * rename * support diffusion * update diffuser * update chunk search * optimize imports * import * finish autochunk	2 years ago
Frank Lee	8518263b80	[test] fixed the triton version for testing (#2608 )	2 years ago
HELSON	552183bb74	[polish] polish ColoTensor and its submodules (#2537 )	2 years ago
Frank Lee	dd14783f75	[kernel] fixed repeated loading of kernels (#2549 ) * [kernel] fixed repeated loading of kernels * polish code * polish code	2 years ago
ver217	5b1854309a	[hotfix] fix zero ddp warmup check (#2545 )	2 years ago
oahzxl	fa3d66feb9	support unet metainfo prop (#2544 )	2 years ago
oahzxl	05671fcb42	[autochunk] support multi outputs chunk search (#2538 ) Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy. 1. rewrite search strategy to support multi outputs chunk search 2. fix many, many bugs 3. update tests	2 years ago
oahzxl	63199c6687	[autochunk] support transformer (#2526 )	2 years ago
HELSON	a4ed9125ac	[hotfix] fix lightning error (#2529 )	2 years ago
HELSON	66dfcf5281	[gemini] update the gpt example (#2527 )	2 years ago
HELSON	b528eea0f0	[zero] add zero wrappers (#2523 ) * [zero] add zero wrappers * change names * add wrapper functions to init	2 years ago
Super Daniel	c198c7c0b0	[hotfix] meta tensor default device. (#2510 )	2 years ago
HELSON	077a5cdde4	[zero] fix gradient clipping in hybrid parallelism (#2521 ) * [zero] fix gradient clipping in hybrid parallelism * [testing] change model name to avoid pytest warning * [hotfix] fix unit testing	2 years ago
YuliangLiu0306	aa0f6686f9	[autoparallel] accelerate gpt2 training (#2495 )	2 years ago
HELSON	707b11d4a0	[gemini] update ddp strict mode (#2518 ) * [zero] add strict ddp mode for chunk init * [gemini] update gpt example	2 years ago
HELSON	2d1a7dfe5f	[zero] add strict ddp mode (#2508 ) * [zero] add strict ddp mode * [polish] add comments for strict ddp mode * [zero] fix test error	2 years ago
oahzxl	c04f183237	[autochunk] support parsing blocks (#2506 )	2 years ago
Super Daniel	35c0c0006e	[utils] lazy init. (#2148 ) * [utils] lazy init. * [utils] remove description. * [utils] complete. * [utils] finalize. * [utils] fix names.	2 years ago
oahzxl	72341e65f4	[auto-chunk] support extramsa (#3 ) (#2504 )	2 years ago
Ziyue Jiang	0f02b8c6e6	add avg partition (#2483 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
アマデウス	99d9713b02	Revert "Update parallel_context.py (#2408 )" This reverts commit `7d5640b9db`.	2 years ago
oahzxl	ecccc91f21	[autochunk] support autochunk on evoformer (#2497 )	2 years ago
oahzxl	5db3a5bf42	[fx] allow control of ckpt_codegen init (#2498 ) * [fx] allow control of ckpt_codegen init Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so. So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__. * code style	2 years ago
HELSON	d565a24849	[zero] add unit testings for hybrid parallelism (#2486 )	2 years ago
oahzxl	4953b4ace1	[autochunk] support evoformer tracer (#2485 ) support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it. 1. support some evoformer's op in fx 2. support evoformer test 3. add repos for test code	2 years ago
YuliangLiu0306	67e1912b59	[autoparallel] support origin activation ckpt on autoprallel system (#2468 )	2 years ago
Ziyue Jiang	fef5c949c3	polish pp middleware (#2476 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
HELSON	a5dc4253c6	[zero] polish low level optimizer (#2473 )	2 years ago
Frank Lee	8b7495dd54	[example] integrate seq-parallel tutorial with CI (#2463 )	2 years ago
Jiarui Fang	867c8c2d3a	[zero] low level optim supports ProcessGroup (#2464 )	2 years ago
Frank Lee	14d9299360	[cli] fixed hostname mismatch error (#2465 )	2 years ago
Haofan Wang	9358262992	Fix False warning in initialize.py (#2456 ) * Update initialize.py * pre-commit run check	2 years ago
YuliangLiu0306	8221fd7485	[autoparallel] update binary elementwise handler (#2451 ) * [autoparallel] update binary elementwise handler * polish	2 years ago
HELSON	2bfeb24308	[zero] add warning for ignored parameters (#2446 )	2 years ago
Frank Lee	39163417a1	[example] updated the hybrid parallel tutorial (#2444 ) * [example] updated the hybrid parallel tutorial * polish code	2 years ago
HELSON	5521af7877	[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443 ) * [ddp] add is_ddp_ignored [ddp] rename to is_ddp_ignored * [zero] fix state_dict and load_state_dict * fix bugs * [zero] update unit test for ZeroDDP	2 years ago
YuliangLiu0306	2731531bc2	[autoparallel] integrate device mesh initialization into autoparallelize (#2393 ) * [autoparallel] integrate device mesh initialization into autoparallelize * add megatron solution * update gpt autoparallel examples with latest api * adapt beta value to fit the current computation cost	2 years ago
Frank Lee	c72c827e95	[cli] provided more details if colossalai run fail (#2442 )	2 years ago
Super Daniel	c41e59e5ad	[fx] allow native ckpt trace and codegen. (#2438 )	2 years ago
YuliangLiu0306	41429b9b28	[autoparallel] add shard option (#2423 )	2 years ago
HELSON	7829aa094e	[ddp] add is_ddp_ignored (#2434 ) [ddp] rename to is_ddp_ignored	2 years ago
HELSON	bb4e9a311a	[zero] add inference mode and its unit test (#2418 )	2 years ago
Jiarui Fang	93f62dd152	[autochunk] add autochunk feature	2 years ago
HELSON	dddacd2d2c	[hotfix] add norm clearing for the overflow step (#2416 )	2 years ago
oahzxl	7ab2db206f	adapt new fx	2 years ago
oahzxl	e532679c95	Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk	2 years ago
Haofan Wang	7d5640b9db	Update parallel_context.py (#2408 )	2 years ago
oahzxl	fd818cf144	change imports	2 years ago
oahzxl	a591d45b29	add available	2 years ago
oahzxl	615e7e68d9	update doc	2 years ago
oahzxl	7d4abaa525	add doc	2 years ago
oahzxl	1be0ac3cbf	add doc for trace indice	2 years ago
oahzxl	0b6af554df	remove useless function	2 years ago
oahzxl	d914a21d64	rename	2 years ago
oahzxl	865f2e0196	rename	2 years ago
HELSON	ea13a201bb	[polish] polish code for get_static_torch_model (#2405 ) * [gemini] polish code * [testing] remove code * [gemini] make more robust	2 years ago
oahzxl	a4ed5b0d0d	rename in doc	2 years ago
oahzxl	1bb1f2ad89	rename	2 years ago
oahzxl	cb9817f75d	rename function from index to indice	2 years ago
oahzxl	0ea903b94e	rename trace_index to trace_indice	2 years ago
Frank Lee	551cafec14	[doc] updated kernel-related optimisers' docstring (#2385 ) * [doc] updated kernel-related optimisers' docstring * polish doc	2 years ago
oahzxl	065f0b4c27	add doc for search	2 years ago
oahzxl	a68d240ed5	add doc for search chunk	2 years ago
oahzxl	1951f7fa87	code style	2 years ago
oahzxl	212b5b1b5f	add comments	2 years ago
oahzxl	19cc64b1d3	remove autochunk_available	2 years ago
eric8607242	9880fd2cd8	Fix state_dict key missing issue of the ZeroDDP (#2363 ) * Fix state_dict output for ZeroDDP duplicated parameters * Rewrite state_dict based on get_static_torch_model * Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)	2 years ago
oahzxl	4d223e18a2	fix typo	2 years ago
Frank Lee	ce08661eb1	[cli] updated installation check cli for aot/jit build (#2395 )	2 years ago
jiaruifang	69d9180c4b	[hotfix] issue #2388	2 years ago
Jiarui Fang	4e96039649	[device] find best logical mesh	2 years ago
Jiarui Fang	8f72b6f8fb	[hotfix] fix implement error in diffusers	2 years ago
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
1SAA	33f3023e19	[hotfix] fix implement error in diffusers	2 years ago
Jiarui Fang	12c8bf38d7	[Pipeline] Refine GPT PP Example	2 years ago
oahzxl	8a989a0d89	code style	2 years ago
oahzxl	c3a2bf48b4	code style	2 years ago
oahzxl	a6cdbf9161	seperate trace flow	2 years ago
oahzxl	4748967fb1	ad reorder graph	2 years ago
oahzxl	da4076846d	rename	2 years ago
oahzxl	c3d72f7db9	seperate reorder	2 years ago
binmakeswell	a881d6d000	Revert "[NFC] polish code format" (#2372 )	2 years ago
Ziyue Jiang	9ae9e74017	fix diff device in some partition	2 years ago
Jiarui Fang	0dcc410f57	[NFC] polish code format	2 years ago
oahzxl	6685a9d022	seperate non chunk input	2 years ago
binmakeswell	d634eae05b	Revert "[NFC] polish code format (#2367 )" (#2371 ) This reverts commit `1f8ab6f1f5`.	2 years ago
oahzxl	f856611d21	seperate prepose_nodes	2 years ago
Shawn-Kong	d42aecdda1	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/embedding_handler.py code style (#2368 )	2 years ago
Jiarui Fang	1aaeb596c6	[example] gpt, shard init on all processes (#2366 )	2 years ago
oahzxl	f4a1607e56	seperate input node dim search	2 years ago
binmakeswell	1f8ab6f1f5	[NFC] polish code format (#2367 )	2 years ago
oahzxl	ae27a8b26d	seperate flow tracer	2 years ago
oahzxl	fd87d78a28	rename ambiguous variable	2 years ago
oahzxl	2bde9d2b7f	code format	2 years ago
oahzxl	8a634af2f5	close mem and code print	2 years ago
oahzxl	1a6d2a740b	take apart chunk code gen	2 years ago
ExtremeViscent	ac0d30fe2e	[NFC] polish batch_norm_handler.py code style (#2359 )	2 years ago
HELSON	48d33b1b17	[gemini] add get static torch model (#2356 )	2 years ago
oahzxl	efb1c64c30	restruct dir	2 years ago
ziyuhuang123	7080a8edb0	[workflow]New version: Create workflow files for examples' auto check (#2298 ) * [workflows]bug_repair * [workflow]new_pr_fixing_bugs Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2 years ago
LuGY	e11a005c02	[NFC] polish colossalai/auto_parallel/tensor_shard/utils/factory.py code style (#2349 )	2 years ago
YuliangLiu0306	b5a3a4a65f	[device] find best logical mesh	2 years ago
yuxuan-lou	28e2d16794	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2340 )	2 years ago
YuliangLiu0306	9c9246c0d9	[device] alpha beta profiler (#2311 ) * [device] alpha beta profiler * add usage * fix variable name	2 years ago
Maruyama_Aya	bd12a49e2a	[NFC] polish <colossalai/auto_parallel/tensor_shard/deprecated/constants.py> code style (#2339 )	2 years ago
Zihao	35427bcab4	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/unary_elementwise_handler.py code style (#2326 )	2 years ago
Jiarui Fang	db6eea3583	[builder] reconfig op_builder for pypi install (#2314 )	2 years ago
Junming Wu	4a79c10750	[NFC] polish colossalai/cli/benchmark/__init__.py code style (#2308 )	2 years ago
Ofey Chan	87d2defda6	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/layer_norm_handler.py code style (#2305 )	2 years ago
ver217	116e3d0b8f	[NFC] polish communication/p2p_v2.py code style (#2303 )	2 years ago
xyupeng	b965585d05	[NFC] polish colossalai/amp/torch_amp/torch_amp.py code style (#2290 )	2 years ago
Zangwei Zheng	d1e5bafcd4	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/__init__.py code style (#2291 )	2 years ago
shenggan	950685873f	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/reshape_handler.py code style (#2292 )	2 years ago
Ziheng Qin	3041014089	[NFC] polish colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py code style (#2299 ) Co-authored-by: henryqin1997 <henryqin1997@gamil.com>	2 years ago
アマデウス	49715a78f0	[NFC] polish colossalai/cli/benchmark/benchmark.py code style (#2287 )	2 years ago
Zirui Zhu	1c29b173c9	[NFC] polish colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py code style (#2289 )	2 years ago
Zihao	3a02b46447	[auto-parallel] refactoring ColoTracer (#2118 ) * add meta_data_computing * add checkpoint_annotation * rename proxy.data to proxy.meta_data and add bias addition pass * polish code * delete meta_prop_pass invoke and rename ori_node to orig_node * add TracerType * unify meta data computing * delete TracerType * handle setitem operation * operator.setitem	2 years ago
HELSON	5d3a2be3af	[amp] add gradient clipping for unit tests (#2283 ) * [amp] add gradient clipping in unit tests * fix bugs	2 years ago
Boyuan Yao	d45695d94e	Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel [autockpt] provide option for activation checkpoint search in SPMD solver	2 years ago
Jiarui Fang	16cc8e6aa7	[builder] MOE builder (#2277 )	2 years ago
Boyuan Yao	b904748210	[autoparallel] bypass MetaInfo when unavailable and modify BCAST_FUNC_OP metainfo (#2293 ) * [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline * [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop * [autoparallel] specifycomm nodes' memory cost in construct chain * [autoparallel] fix wrong runtime apply calculation * [autoparallel] fix wrong runtime apply calculation * [autoparallel] fix wrong runtime apply calculation * [autoparallel] bypass metainfo when available and modify BCAST_FUNC_OP	2 years ago
Super Daniel	8ea50d999e	[hotfix] pass a parameter. (#2288 ) * [autockpt] make it work. * [autockpt] linearize / merge shape-consistency nodes. * [autockpt] considering parameter and optimizer weights. * [hotfix] pass a parameter.	2 years ago
zbian	e94c79f15b	improved allgather & reducescatter for 3d	2 years ago
HELSON	62c38e3330	[zero] polish low level zero optimizer (#2275 )	2 years ago
Ziyue Jiang	ac863a01d6	[example] add benchmark (#2276 ) * add benchmark * merge common func * add total and avg tflops Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Boyuan Yao	22e947f982	[autoparallel] fix runtime apply memory estimation (#2281 ) * [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline * [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop * [autoparallel] specifycomm nodes' memory cost in construct chain * [autoparallel] fix wrong runtime apply calculation * [autoparallel] fix wrong runtime apply calculation * [autoparallel] fix wrong runtime apply calculation	2 years ago
Super Daniel	8e8900ff3f	[autockpt] considering parameter and optimizer weights. (#2279 ) * [autockpt] make it work. * [autockpt] linearize / merge shape-consistency nodes. * [autockpt] considering parameter and optimizer weights.	2 years ago
YuliangLiu0306	f027ef7913	[hotfix] fix fp16 optimzier bug (#2273 )	2 years ago
YuliangLiu0306	fb87322773	[autoparallel] fix spelling error (#2270 )	2 years ago
Jiarui Fang	af32022f74	[Gemini] fix the convert_to_torch_module bug (#2269 )	2 years ago
Super Daniel	b0d21d0c4f	[autockpt] linearize / merge shape-consistency nodes. (#2271 ) * [autockpt] make it work. * [autockpt] linearize / merge shape-consistency nodes.	2 years ago
YuliangLiu0306	4b29112ab2	[autoparallel] gpt2 autoparallel examples (#2267 ) * [autoparallel] gpt2 autoparallel examples * polish code * polish code	2 years ago
Ziyue Jiang	8b045b3c1f	[Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232 ) * move to cpu to avoid dead lock * get output by offsets Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Boyuan Yao	5c2ef9fc76	[autoparallel] modify comm nodes' memory cost in construct chain (#2263 ) * [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline * [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop * [autoparallel] specifycomm nodes' memory cost in construct chain	2 years ago
Boyuan Yao	1ea99b869e	[autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline (#2261 )	2 years ago
Super Daniel	3ccf58aa76	[autockpt] make it work. (#2257 )	2 years ago
Boyuan Yao	ac3739930d	[autoparallel] modify construct chain in rotor solver (#2254 )	2 years ago
Boyuan Yao	ab38aebace	[autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint (#2248 ) * [autoparallel] hook node meta on graph nodes for checkpoint solver * [autoparallel] polish code * [autoparallel] restore some node handlers * colossalai/auto_parallel/passes/meta_info_prop.py * [autoparallel] remove some unused import * [autoparallel] hook bwd_mem_out	2 years ago
Boyuan Yao	c8c79102f0	[autoparallel] patch torch.flatten metainfo for autoparallel (#2247 ) * [autoparallel] patch torch.flatten	2 years ago
YuliangLiu0306	8897b8f753	[autoparallel] autoparallel initialize (#2238 )	2 years ago
xcnick	85178a397a	[hotfix] fix error for torch 2.0 (#2243 )	2 years ago
Super Daniel	b7d0990c61	[autoparallel] fix construct meta info. (#2245 )	2 years ago
Ziyue Jiang	57929a6210	fix type of num_worker_threads (#2237 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Jiarui Fang	db4cbdc7fb	[builder] builder for scaled_upper_triang_masked_softmax (#2234 )	2 years ago
Super Daniel	78483a9fdd	[logger] hotfix, missing _FORMAT (#2231 )	2 years ago
Jiarui Fang	54de05da5d	[builder] polish builder with better base class (#2216 ) * [builder] polish builder * remove print	2 years ago
YuliangLiu0306	3b1b91eaf4	[autoparallel] record parameter attribute in colotracer (#2217 ) * [autoparallel] record parameter attribute in collotracer * [autoparallel] fix construct_meta_info bug	2 years ago
Jiarui Fang	7675792100	[builder] raise Error when CUDA_HOME is not set (#2213 )	2 years ago
Jiarui Fang	d5e3e3ec01	[example] update gpt example for larger model scale (#2211 )	2 years ago
Boyuan Yao	24246f7aa5	[autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162 ) * [fx] metainfo class for auto parallel * [fx] add unit test for linear metainfo * [fx] fix bwd param for linear * [fx] modify unit test * [fx] modify unit test * [fx] modify import * [fx] modify import * [fx] modify import * [fx] move meta profiler to auto parallel * [fx] add conv metainfo class * [fx] restore profiler * [fx] restore meta profiler * [autoparallel] modify unit test * [fx] modify unit test * [autoparallel] add batchnorm metainfo class * [autoparallel] fix batchnorm unit test function declaration * [fx] restore profiler * [fx] add relu metainfo class * [fx] restore profiler * [autoparallel] modify metainfo input * [autoparallel] add pooling metainfo * [autoparallel] add F.linear metainfo generator * [autoparallel] add binary elementwise metainfo * [fx] recover profiler * [autoparallel] fix forward memory calculation * [autoparallel] modify constants.py * [autoparallel] remove redundant print * [autoparallel] add F.conv metainfo * [autoparallel] linear fix * [autoparallel] memory estimation for communication actions * [autoparallel] fix docstring * [autoparallel] fix variables name * [autoparallel] attach tensor to metainfo class * [autoparallel] fix dangerous try except * [autoparallel] attach memory cost to shape consistency node * [autoparallel] attach shape consistency node's metainfo to the node * [autoparallel] remove todo in shape consistency memory estimation * [autoparallel] fix the annotation	2 years ago
Boyuan Yao	d0bc5a1b34	[autoparallel] new metainfoprop based on metainfo class (#2179 ) * [autoparallel] new metainfoprop to combine SPMD solver and checkpoint solver * [autoparallel] new metainfoprop to combine SPMD solver and checkpoint solver * [autoparallel] modify placeholder handler * [autoparallel] modify metainfoprop * [autoparallel] fix function typo * [autoparallel] fix placeholder handler	2 years ago
YuliangLiu0306	78509124d3	[autoparallel] update getitem handler (#2207 )	2 years ago
Jiarui Fang	1cb532ffec	[builder] multihead attn runtime building (#2203 ) * [hotfix] correcnt cpu_optim runtime compilation * [builder] multihead attn * fix bug * fix a bug	2 years ago
Tongping Liu	8e22c38b89	[hotfix] Fixing the bug related to ipv6 support Co-authored-by: ByteDance <tongping.liu@bytedance.com>	2 years ago
YuliangLiu0306	4851f2d607	[autoparallel] update_getattr_handler (#2193 )	2 years ago
Jiarui Fang	5682e6d346	[hotfix] correcnt cpu_optim runtime compilation (#2197 )	2 years ago
HELSON	2458659919	[zero] fix error for BEiT models (#2169 ) * [zero] fix error for BEiT models * [ColoParameter] add unpack operation for tuple arguments * fix bugs * fix chunkv2 unit testing * add assertion for gradient state	2 years ago
Jiarui Fang	355ffb386e	[builder] unified cpu_optim fused_optim inferface (#2190 )	2 years ago
Jiarui Fang	9587b080ba	[builder] use runtime builder for fused_optim (#2189 )	2 years ago
Jiarui Fang	bc0e271e71	[buider] use builder() for cpu adam and fused optim in setup.py (#2187 )	2 years ago
Jiarui Fang	d42afd30f8	[builder] runtime adam and fused_optim builder (#2184 )	2 years ago
YuliangLiu0306	550f8f8905	[autoparallel] integrate_gpt_related_tests (#2134 ) * [autoparallel] integrate_gpt_related_tests * polish code * polish code * add GPT2Model into runtime test	2 years ago
Ziyue Jiang	59e343328d	[Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage (#2156 ) * add splitter * polish code * remove comment * fix async nan by moving to cpu first Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Tongping Liu	ab54fed292	[hotfix] add kwargs for colo_addmm (#2171 )	2 years ago
アマデウス	622f863291	[hotfix] Jit type hint #2161 (#2164 )	2 years ago
Zihao	12e7bcd720	register meta func for rnn (#2159 )	2 years ago
Boyuan Yao	cfe2a9bd90	[autoparallel] memory estimation for shape consistency (#2144 ) * [fx] metainfo class for auto parallel * [fx] add unit test for linear metainfo * [fx] fix bwd param for linear * [fx] modify unit test * [fx] modify unit test * [fx] modify import * [fx] modify import * [fx] modify import * [fx] move meta profiler to auto parallel * [fx] add conv metainfo class * [fx] restore profiler * [fx] restore meta profiler * [autoparallel] modify unit test * [fx] modify unit test * [autoparallel] add batchnorm metainfo class * [autoparallel] fix batchnorm unit test function declaration * [fx] restore profiler * [fx] add relu metainfo class * [fx] restore profiler * [autoparallel] modify metainfo input * [autoparallel] add pooling metainfo * [autoparallel] add F.linear metainfo generator * [autoparallel] add binary elementwise metainfo * [fx] recover profiler * [autoparallel] fix forward memory calculation * [autoparallel] modify constants.py * [autoparallel] remove redundant print * [autoparallel] add F.conv metainfo * [autoparallel] linear fix * [autoparallel] memory estimation for communication actions * [autoparallel] fix docstring * [autoparallel] fix variables name	2 years ago
Jiarui Fang	b87496a66b	[hotfix] fix auto policy of test_sharded_optim_v2 (#2157 )	2 years ago
YuliangLiu0306	16335cb537	[hotfix] fix aten default bug (#2158 )	2 years ago
HELSON	a7d95b7024	[example] add zero1, zero2 example in GPT examples (#2146 ) * [example] add zero1 and zero2 for GPT * update readme in gpt example * polish code * change init value * update readme	2 years ago
YuliangLiu0306	1cce6e36ca	[autoparallel] use metainfo in handler (#2149 )	2 years ago
Jiarui Fang	2827f41898	[Gemini] GeminiDPP convert to PyTorch Module. (#2151 )	2 years ago
Jiarui Fang	bdef9dfdbe	[NFC] remove useless graph node code (#2150 )	2 years ago
BlueRum	b3f73ce1c8	[Gemini] Update coloinit_ctx to support meta_tensor (#2147 )	2 years ago
Zihao	a128eec9d5	register aten._convolution.default (#2137 )	2 years ago
Jiarui Fang	ee287620f0	[Gemini] revert ZeROInitCtx related tracer (#2138 )	2 years ago
アマデウス	077a66dd81	updated attention kernel (#2133 )	2 years ago
YuliangLiu0306	a3c6924deb	[autoparallel] process size nodes in runtime pass (#2130 ) * [autoparallel] process size nodes in runtime pass * polish code	2 years ago
YuliangLiu0306	536560ccc0	[autoparallel] implement softmax handler (#2132 )	2 years ago
Jiarui Fang	c89c66a858	[Gemini] update API of the chunkmemstatscollector. (#2129 )	2 years ago
Jiarui Fang	2938edf446	[Gemini] update the non model data record method in runtime memory tracer (#2128 )	2 years ago
Jiarui Fang	8fac837679	[Gemini] update non model data calculation method (#2126 )	2 years ago
Jiarui Fang	5efda69735	[Gemini] hotfix the unittest bugs (#2125 )	2 years ago
Jiarui Fang	05bb28aacf	[Gemini] mapping of preop timestep and param (#2124 )	2 years ago
YuliangLiu0306	cd0af9f7f6	[autoparallel] gpt2lp runtimee test (#2113 )	2 years ago
Jiarui Fang	9214d1fe28	[Gemini] chunk init using runtime visited param order (#2115 )	2 years ago

... 2 3 4 5 6 ...

1395 Commits (feature/elixir)