ColossalAI

Commit Graph

Author	SHA1	Message	Date
Xuanlei Zhao	2ca9728cbb	[autochunk] refactor chunk memory estimation (#2762 ) * refact memory code * dont log free var memory * add memory align * update chunk target * update setting for new memory * finish test * update tracer * update typo * update test	2 years ago
YuliangLiu0306	29386a54e6	[DTensor] refactor CommSpec (#3034 )	2 years ago
YuliangLiu0306	cd2b0eaa8d	[DTensor] refactor sharding spec (#2987 ) * [autoparallel] refactor sharding spec * rename function name	2 years ago
Ziyue Jiang	400f63012e	[pipeline] Add Simplified Alpa DP Partition (#2507 ) * add alpa dp split * add alpa dp split * use fwd+bwd instead of fwd only --------- Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Super Daniel	b42d3d28ed	[fx] remove depreciated algorithms. (#2312 ) (#2313 )	2 years ago
github-actions[bot]	82503a96f2	[format] applied code formatting on changed files in pull request 2997 (#3008 ) Co-authored-by: github-actions <github-actions@github.com>	2 years ago
binmakeswell	52a5078988	[doc] add ISC tutorial (#2997 ) * [doc] add ISC tutorial * [doc] add ISC tutorial * [doc] add ISC tutorial * [doc] add ISC tutorial	2 years ago
ver217	823f3b9cf4	[doc] add deepspeed citation and copyright (#2996 ) * [doc] add deepspeed citation and copyright * [doc] add deepspeed citation and copyright * [doc] add deepspeed citation and copyright	2 years ago
YuliangLiu0306	e414e4092b	[DTensor] implementation of dtensor (#2946 ) * [DTensor] implementation of dtensor * test layout convert * polish	2 years ago
YuliangLiu0306	47fb214b3b	[hotfix] add shard dim to aviod backward communication error (#2954 )	2 years ago
ver217	090f14fd6b	[misc] add reference (#2930 ) * [misc] add reference * [misc] add license	2 years ago
YuliangLiu0306	197d0bf4ed	[autoparallel] apply repeat block to reduce solving time (#2912 )	2 years ago
YH	a848091141	Fix port exception type (#2925 )	2 years ago
zbian	61e687831d	fixed using zero with tp cannot access weight correctly	2 years ago
YH	7b13f7db18	[zero] trivial zero optimizer refactoring (#2869 ) * Fix mionr grad store interface * Apply lint	2 years ago
Jiatong (Julius) Han	8c8a39be95	[hotfix]: Remove math.prod dependency (#2837 ) * Remove math.prod dependency * Fix style * Fix style --------- Co-authored-by: Jiatong Han <jiatong.han@u.nus.edu>	2 years ago
YuliangLiu0306	819e25d8b1	[hotfix] fix autoparallel compatibility test issues (#2754 )	2 years ago
YuliangLiu0306	0f392d7403	[autoparallel] find repeat blocks (#2854 ) * [autoparallel] find repeat blocks * polish * polish * polish	2 years ago
junxu	c52edcf0eb	Rename class method of ZeroDDP (#2692 )	2 years ago
HELSON	6e4ac08172	[hotfix] fix chunk size can not be divided (#2867 ) * [hotfix] fix chunk size can not be divided * [hotfix] use numpy for python3.8	2 years ago
Boyuan Yao	eae77c831d	[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823 ) * [autoparallel] non spmd meta information generator * [autoparallel] patch meta information for non spmd nodes	2 years ago
Boyuan Yao	c7764d3f22	[autoparallel] Patch meta information of `torch.where` (#2822 ) * [autoparallel] patch meta information of torch.where * [autoparallel] pre-commit modified	2 years ago
Boyuan Yao	fcc4097efa	[autoparallel] Patch meta information of `torch.tanh()` and `torch.nn.Dropout` (#2773 ) * [autoparallel] tanh meta information * [autoparallel] remove redundant code * [autoparallel] patch meta information of torch.nn.Dropout	2 years ago
Frank Lee	935346430f	[cli] handled version check exceptions (#2848 ) * [cli] handled version check exceptions * polish code	2 years ago
Frank Lee	918bc94b6b	[triton] added copyright information for flash attention (#2835 ) * [triton] added copyright information for flash attention * polish code	2 years ago
Boyuan Yao	7ea6bc7f69	[autoparallel] Patch tensor related operations meta information (#2789 ) * [autoparallel] tensor related meta information prototype * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information	2 years ago
Michelle	c008d4ad0c	[NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744 )	2 years ago
YuliangLiu0306	2059fdd6b0	[hotfix] add copyright for solver and device mesh (#2803 ) * [hotfix] add copyright for solver and device mesh * add readme * add alpa license * polish	2 years ago
Boyuan Yao	8593ae1a3f	[autoparallel] rotor solver refactor (#2813 ) * [autoparallel] rotor solver refactor * [autoparallel] rotor solver refactor	2 years ago
HELSON	56ddc9ca7a	[hotfix] add correct device for fake_param (#2796 )	2 years ago
Boyuan Yao	a2b43e393d	[autoparallel] Patch meta information of `torch.nn.Embedding` (#2760 ) * [autoparallel] embedding metainfo * [autoparallel] fix function name in test_activation_metainfo * [autoparallel] undo changes in activation metainfo and related tests	2 years ago
Boyuan Yao	8e3f66a0d1	[zero] fix wrong import (#2777 )	2 years ago
Nikita Shulga	01066152f1	Don't use `torch._six` (#2775 ) * Don't use `torch._six` This is a private API which is gone after https://github.com/pytorch/pytorch/pull/94709 * Update common.py	2 years ago
binmakeswell	93b788b95a	Merge branch 'main' into fix/format	2 years ago
xyupeng	2fd528b9f4	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2737 )	2 years ago
YuliangLiu0306	1dc003c169	[autoparallel] distinguish different parallel strategies (#2699 )	2 years ago
YH	ae86a29e23	Refact method of grad store (#2687 )	2 years ago
Zirui Zhu	c9e3ee389e	[NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726 )	2 years ago
Zangwei Zheng	1819373e5c	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/batch_norm_handler.py code style (#2728 )	2 years ago
Wangbo Zhao(黑色枷锁)	8331420520	[NFC] polish colossalai/cli/cli.py code style (#2734 )	2 years ago
ziyuhuang123	d344313533	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/embedding_handler.py code style (#2725 )	2 years ago
Xue Fuzhao	e81caeb4bc	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/cost_graph.py code style (#2720 ) Co-authored-by: Fuzhao Xue <fuzhao@login2.ls6.tacc.utexas.edu>	2 years ago
yuxuan-lou	51c45c2460	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/where_handler.py code style (#2723 )	2 years ago
YuliangLiu0306	21d6a48f4d	[autoparallel] add shard option (#2696 ) * [autoparallel] add shard option * polish	2 years ago
YuliangLiu0306	5b24987fa7	[autoparallel] fix parameters sharding bug (#2716 )	2 years ago
Ziyue Jiang	4603538ddd	[NFC] posh colossalai/context/process_group_initializer/initializer_sequence.py code style (#2712 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
YuliangLiu0306	cb2c6a2415	[autoparallel] refactor runtime pass (#2644 ) * [autoparallel] refactor runtime pass * add unit test * polish	2 years ago
Zihao	b3d10db5f1	[NFC] polish colossalai/cli/launcher/__init__.py code style (#2709 )	2 years ago
YuliangLiu0306	0b2a738393	[autoparallel] remove deprecated codes (#2664 )	2 years ago
YuliangLiu0306	7fa6be49d2	[autoparallel] test compatibility for gemini and auto parallel (#2700 )	2 years ago
CZYCW	4ac8bfb072	[NFC] polish colossalai/engine/gradient_handler/utils.py code style (#2708 )	2 years ago
Liu Ziming	6427c406cf	[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/strategy_generator.py code style (#2695 ) Co-authored-by: shenggan <csg19971016@gmail.com>	2 years ago
アマデウス	534f68c83c	[NFC] polish pipeline process group code style (#2694 )	2 years ago
LuGY	56ff1921e9	[NFC] polish colossalai/context/moe_context.py code style (#2693 )	2 years ago
Shawn-Kong	1712da2800	[NFC] polish colossalai/gemini/gemini_context.py code style (#2690 )	2 years ago
HELSON	df4f020ee3	[zero1&2] only append parameters with gradients (#2681 )	2 years ago
ver217	f0aa191f51	[gemini] fix colo_init_context (#2683 )	2 years ago
Boyuan Yao	40c916b192	[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` (#2674 ) * [autoparallel] softmax metainfo * [autoparallel] softmax metainfo	2 years ago
HELSON	8213f89fd2	[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671 )	2 years ago
binmakeswell	9ab14b20b5	[doc] add CVPR tutorial (#2666 )	2 years ago
Boyuan Yao	0385b26ebf	[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647 ) * [autoparallel] layernorm metainfo patch * [autoparallel] polish test	2 years ago
YuliangLiu0306	37df666f38	[autoparallel] refactor handlers which reshape input tensors (#2615 ) * [autoparallel] refactor handlers which reshape input tensors * polish	2 years ago
YuliangLiu0306	28398f1c70	add overlap option (#2613 )	2 years ago
YuliangLiu0306	cb3d1bef62	[autoparallel] adapt autoparallel tests with latest api (#2626 )	2 years ago
Boyuan Yao	90a9fdd91d	[autoparallel] Patch meta information of `torch.matmul` (#2584 ) * [autoparallel] matmul metainfo * [auto_parallel] remove unused print * [tests] skip test_matmul_handler when torch version is lower than 1.12.0	2 years ago
oahzxl	6ba8364881	[autochunk] support diffusion for autochunk (#2621 ) * add alphafold benchmark * renae alphafold test * rename tests * rename diffuser * renme * rename * update transformer * update benchmark * update benchmark * update bench memory * update transformer benchmark * rename * support diffuser * support unet metainfo prop * fix bug and simplify code * update linear and support some op * optimize max region search, support conv * update unet test * support some op * support groupnorm and interpolate * update flow search * add fix dim in node flow * fix utils * rename * support diffusion * update diffuser * update chunk search * optimize imports * import * finish autochunk	2 years ago
Frank Lee	8518263b80	[test] fixed the triton version for testing (#2608 )	2 years ago
HELSON	552183bb74	[polish] polish ColoTensor and its submodules (#2537 )	2 years ago
Frank Lee	dd14783f75	[kernel] fixed repeated loading of kernels (#2549 ) * [kernel] fixed repeated loading of kernels * polish code * polish code	2 years ago
ver217	5b1854309a	[hotfix] fix zero ddp warmup check (#2545 )	2 years ago
oahzxl	fa3d66feb9	support unet metainfo prop (#2544 )	2 years ago
oahzxl	05671fcb42	[autochunk] support multi outputs chunk search (#2538 ) Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy. 1. rewrite search strategy to support multi outputs chunk search 2. fix many, many bugs 3. update tests	2 years ago
oahzxl	63199c6687	[autochunk] support transformer (#2526 )	2 years ago
HELSON	a4ed9125ac	[hotfix] fix lightning error (#2529 )	2 years ago
HELSON	66dfcf5281	[gemini] update the gpt example (#2527 )	2 years ago
HELSON	b528eea0f0	[zero] add zero wrappers (#2523 ) * [zero] add zero wrappers * change names * add wrapper functions to init	2 years ago
Super Daniel	c198c7c0b0	[hotfix] meta tensor default device. (#2510 )	2 years ago
HELSON	077a5cdde4	[zero] fix gradient clipping in hybrid parallelism (#2521 ) * [zero] fix gradient clipping in hybrid parallelism * [testing] change model name to avoid pytest warning * [hotfix] fix unit testing	2 years ago
YuliangLiu0306	aa0f6686f9	[autoparallel] accelerate gpt2 training (#2495 )	2 years ago
HELSON	707b11d4a0	[gemini] update ddp strict mode (#2518 ) * [zero] add strict ddp mode for chunk init * [gemini] update gpt example	2 years ago
HELSON	2d1a7dfe5f	[zero] add strict ddp mode (#2508 ) * [zero] add strict ddp mode * [polish] add comments for strict ddp mode * [zero] fix test error	2 years ago
oahzxl	c04f183237	[autochunk] support parsing blocks (#2506 )	2 years ago
Super Daniel	35c0c0006e	[utils] lazy init. (#2148 ) * [utils] lazy init. * [utils] remove description. * [utils] complete. * [utils] finalize. * [utils] fix names.	2 years ago
oahzxl	72341e65f4	[auto-chunk] support extramsa (#3 ) (#2504 )	2 years ago
Ziyue Jiang	0f02b8c6e6	add avg partition (#2483 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
アマデウス	99d9713b02	Revert "Update parallel_context.py (#2408 )" This reverts commit `7d5640b9db`.	2 years ago
oahzxl	ecccc91f21	[autochunk] support autochunk on evoformer (#2497 )	2 years ago
oahzxl	5db3a5bf42	[fx] allow control of ckpt_codegen init (#2498 ) * [fx] allow control of ckpt_codegen init Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so. So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__. * code style	2 years ago
HELSON	d565a24849	[zero] add unit testings for hybrid parallelism (#2486 )	2 years ago
oahzxl	4953b4ace1	[autochunk] support evoformer tracer (#2485 ) support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it. 1. support some evoformer's op in fx 2. support evoformer test 3. add repos for test code	2 years ago
YuliangLiu0306	67e1912b59	[autoparallel] support origin activation ckpt on autoprallel system (#2468 )	2 years ago
Ziyue Jiang	fef5c949c3	polish pp middleware (#2476 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
HELSON	a5dc4253c6	[zero] polish low level optimizer (#2473 )	2 years ago
Frank Lee	8b7495dd54	[example] integrate seq-parallel tutorial with CI (#2463 )	2 years ago
Jiarui Fang	867c8c2d3a	[zero] low level optim supports ProcessGroup (#2464 )	2 years ago
Frank Lee	14d9299360	[cli] fixed hostname mismatch error (#2465 )	2 years ago
Haofan Wang	9358262992	Fix False warning in initialize.py (#2456 ) * Update initialize.py * pre-commit run check	2 years ago
YuliangLiu0306	8221fd7485	[autoparallel] update binary elementwise handler (#2451 ) * [autoparallel] update binary elementwise handler * polish	2 years ago
HELSON	2bfeb24308	[zero] add warning for ignored parameters (#2446 )	2 years ago
Frank Lee	39163417a1	[example] updated the hybrid parallel tutorial (#2444 ) * [example] updated the hybrid parallel tutorial * polish code	2 years ago
HELSON	5521af7877	[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443 ) * [ddp] add is_ddp_ignored [ddp] rename to is_ddp_ignored * [zero] fix state_dict and load_state_dict * fix bugs * [zero] update unit test for ZeroDDP	2 years ago
YuliangLiu0306	2731531bc2	[autoparallel] integrate device mesh initialization into autoparallelize (#2393 ) * [autoparallel] integrate device mesh initialization into autoparallelize * add megatron solution * update gpt autoparallel examples with latest api * adapt beta value to fit the current computation cost	2 years ago
Frank Lee	c72c827e95	[cli] provided more details if colossalai run fail (#2442 )	2 years ago
Super Daniel	c41e59e5ad	[fx] allow native ckpt trace and codegen. (#2438 )	2 years ago
YuliangLiu0306	41429b9b28	[autoparallel] add shard option (#2423 )	2 years ago
HELSON	7829aa094e	[ddp] add is_ddp_ignored (#2434 ) [ddp] rename to is_ddp_ignored	2 years ago
HELSON	bb4e9a311a	[zero] add inference mode and its unit test (#2418 )	2 years ago
Jiarui Fang	93f62dd152	[autochunk] add autochunk feature	2 years ago
HELSON	dddacd2d2c	[hotfix] add norm clearing for the overflow step (#2416 )	2 years ago
oahzxl	7ab2db206f	adapt new fx	2 years ago
oahzxl	e532679c95	Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk	2 years ago
Haofan Wang	7d5640b9db	Update parallel_context.py (#2408 )	2 years ago
oahzxl	fd818cf144	change imports	2 years ago
oahzxl	a591d45b29	add available	2 years ago
oahzxl	615e7e68d9	update doc	2 years ago
oahzxl	7d4abaa525	add doc	2 years ago
oahzxl	1be0ac3cbf	add doc for trace indice	2 years ago
oahzxl	0b6af554df	remove useless function	2 years ago
oahzxl	d914a21d64	rename	2 years ago
oahzxl	865f2e0196	rename	2 years ago
HELSON	ea13a201bb	[polish] polish code for get_static_torch_model (#2405 ) * [gemini] polish code * [testing] remove code * [gemini] make more robust	2 years ago
oahzxl	a4ed5b0d0d	rename in doc	2 years ago
oahzxl	1bb1f2ad89	rename	2 years ago
oahzxl	cb9817f75d	rename function from index to indice	2 years ago
oahzxl	0ea903b94e	rename trace_index to trace_indice	2 years ago
Frank Lee	551cafec14	[doc] updated kernel-related optimisers' docstring (#2385 ) * [doc] updated kernel-related optimisers' docstring * polish doc	2 years ago
oahzxl	065f0b4c27	add doc for search	2 years ago
oahzxl	a68d240ed5	add doc for search chunk	2 years ago
oahzxl	1951f7fa87	code style	2 years ago
oahzxl	212b5b1b5f	add comments	2 years ago
oahzxl	19cc64b1d3	remove autochunk_available	2 years ago
eric8607242	9880fd2cd8	Fix state_dict key missing issue of the ZeroDDP (#2363 ) * Fix state_dict output for ZeroDDP duplicated parameters * Rewrite state_dict based on get_static_torch_model * Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)	2 years ago
oahzxl	4d223e18a2	fix typo	2 years ago
Frank Lee	ce08661eb1	[cli] updated installation check cli for aot/jit build (#2395 )	2 years ago
jiaruifang	69d9180c4b	[hotfix] issue #2388	2 years ago
Jiarui Fang	4e96039649	[device] find best logical mesh	2 years ago
Jiarui Fang	8f72b6f8fb	[hotfix] fix implement error in diffusers	2 years ago
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
1SAA	33f3023e19	[hotfix] fix implement error in diffusers	2 years ago
Jiarui Fang	12c8bf38d7	[Pipeline] Refine GPT PP Example	2 years ago
oahzxl	8a989a0d89	code style	2 years ago
oahzxl	c3a2bf48b4	code style	2 years ago
oahzxl	a6cdbf9161	seperate trace flow	2 years ago
oahzxl	4748967fb1	ad reorder graph	2 years ago
oahzxl	da4076846d	rename	2 years ago
oahzxl	c3d72f7db9	seperate reorder	2 years ago
binmakeswell	a881d6d000	Revert "[NFC] polish code format" (#2372 )	2 years ago
Ziyue Jiang	9ae9e74017	fix diff device in some partition	2 years ago
Jiarui Fang	0dcc410f57	[NFC] polish code format	2 years ago
oahzxl	6685a9d022	seperate non chunk input	2 years ago

1 2 3 4 5 ...

1396 Commits (187874975325c4768b0850a818092de5bef1b071)