ColossalAI

Commit Graph

Author	SHA1	Message	Date
Frank Lee	73d3e4d309	[booster] implemented the torch ddd + resnet example (#3232 ) * [booster] implemented the torch ddd + resnet example * polish code	2023-03-27 10:24:14 +08:00
YuliangLiu0306	4d5d8f98a4	[API] implement device mesh manager (#3221 ) * [API] implement device mesh manager * polish	2023-03-24 13:39:12 +08:00
YuliangLiu0306	045afa3ea2	[hotfix] skip torchaudio tracing test (#3211 ) * [hotfix] skip torchaudio tracing test * fix lazy init test issue	2023-03-24 12:15:33 +08:00
Frank Lee	cd142fbefa	[api] implemented the checkpoint io module (#3205 ) * [api] implemented the checkpoint io module * polish code * polish code	2023-03-23 10:53:17 +08:00
ver217	f8289d4221	[lazyinit] combine lazy tensor with dtensor (#3204 ) * [lazyinit] lazy tensor add distribute * [lazyinit] refactor distribute * [lazyinit] add test dist lazy init * [lazyinit] add verbose info for dist lazy init * [lazyinit] fix rnn flatten weight op * [lazyinit] polish test * [lazyinit] polish test * [lazyinit] fix lazy tensor data setter * [lazyinit] polish test * [lazyinit] fix clean * [lazyinit] make materialize inplace * [lazyinit] refactor materialize * [lazyinit] refactor test distribute * [lazyinit] fix requires_grad * [lazyinit] fix tolist after materialization * [lazyinit] refactor distribute module * [lazyinit] polish docstr * [lazyinit] polish lazy init context * [lazyinit] temporarily skip test * [lazyinit] polish test * [lazyinit] add docstr	2023-03-23 10:53:06 +08:00
YuliangLiu0306	019a847432	[Analyzer] fix analyzer tests (#3197 )	2023-03-22 13:38:11 +08:00
YuliangLiu0306	f57d34958b	[FX] refactor experimental tracer and adapt it with hf models (#3157 ) * pass gpt trace and meta_prop * pass t5 trace and meta_prop * [FX] refactor experimental tracer and adapt it with hf models * pass all mainstream model zoo * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * skip tests * fix CI * using packaging version * polish	2023-03-22 10:40:33 +08:00
Frank Lee	e7f3bed2d3	[booster] added the plugin base and torch ddp plugin (#3180 ) * [booster] added the plugin base and torch ddp plugin * polish code * polish code * polish code	2023-03-21 17:39:30 +08:00
Zihao	18dbe76cae	[auto-parallel] add auto-offload feature (#3154 ) * add auto-offload feature * polish code * fix syn offload runtime pass bug * add offload example * fix offload testing bug * fix example testing bug	2023-03-21 14:17:41 +08:00
zbian	7bc0afc901	updated flash attention usage	2023-03-20 17:57:04 +08:00
Frank Lee	085e7f4eff	[test] fixed torchrec registration in model zoo (#3177 ) * [test] fixed torchrec registration in model zoo * polish code * polish code * polish code	2023-03-20 16:19:06 +08:00
Frank Lee	a9b8402d93	[booster] added the accelerator implementation (#3159 )	2023-03-20 13:59:24 +08:00
Frank Lee	1ad3a636b1	[test] fixed torchrec model test (#3167 ) * [test] fixed torchrec model test * polish code * polish code * polish code * polish code * polish code * polish code	2023-03-20 11:40:25 +08:00
ver217	6ae8ed0407	[lazyinit] add correctness verification (#3147 ) * [lazyinit] fix shared module * [tests] add lazy init test utils * [tests] add torchvision for lazy init * [lazyinit] fix pre op fn * [lazyinit] handle legacy constructor * [tests] refactor lazy init test models * [tests] refactor lazy init test utils * [lazyinit] fix ops don't support meta * [tests] lazy init test timm models * [lazyinit] fix set data * [lazyinit] handle apex layers * [tests] lazy init test transformers models * [tests] lazy init test torchaudio models * [lazyinit] fix import path * [tests] lazy init test torchrec models * [tests] update torch version in CI * [tests] revert torch version in CI * [tests] skip lazy init test	2023-03-17 13:49:04 +08:00
Frank Lee	ed19290560	[booster] implemented mixed precision class (#3151 ) * [booster] implemented mixed precision class * polish code	2023-03-17 11:00:15 +08:00
YuliangLiu0306	ecd643f1e4	[test] add torchrec models to test model zoo (#3139 )	2023-03-15 05:46:04 +00:00
ver217	14a115000b	[tests] model zoo add torchaudio models (#3138 ) * [tests] model zoo add torchaudio models * [tests] refactor torchaudio wavernn * [tests] refactor fx torchaudio tests	2023-03-15 11:51:16 +08:00
Frank Lee	6d48eb0560	[test] added transformers models to test model zoo (#3135 )	2023-03-15 11:26:10 +08:00
Frank Lee	a674c63348	[test] added torchvision models to test model zoo (#3132 ) * [test] added torchvision models to test model zoo * polish code * polish code * polish code * polish code * polish code * polish code	2023-03-15 10:42:07 +08:00
HELSON	1216d1e7bd	[tests] diffuser models in model zoo (#3136 ) * [tests] diffuser models in model zoo * remove useless code * [tests] add diffusers to requirement-test	2023-03-14 17:20:28 +08:00
YuliangLiu0306	2eca4cd376	[DTensor] refactor dtensor with new components (#3089 ) * [DTensor] refactor dtensor with new components * polish	2023-03-14 16:25:47 +08:00
Frank Lee	86ac782d7c	[test] added timm models to test model zoo (#3129 ) * [test] added timm models to test model zoo * polish code * polish code * polish code * polish code * polish code	2023-03-14 14:29:18 +08:00
Xuanlei Zhao	30dd13c450	[autochunk] support complete benchmark (#3121 ) * refact memory code * dont log free var memory * add memory align * update chunk target * update setting for new memory * finish test * update tracer * update typo * update test * add unet test * add bench * update bench * update bench * init * support vit * move to cpu * add cpu benchmark	2023-03-13 17:42:37 +08:00
Super Daniel	fff98f06ed	[analyzer] a minimal implementation of static graph analyzer (#2852 ) * [hotfix] meta tensor default device. * [siu] add experimental submodules to main branch. * [siu] * [siu] * [analyzer] init. * [analyzer] readme. * [analyzer] readme. * [analyzer] readme. * [analyzer] readme. * [test] add test. * Update symbolic_trace.py * mark skip tests. * try except. * try except. * try except. * s * init * init * fix * skip * skip --------- Co-authored-by: Daniel Shao <superdainiu@MININT-PVARVID.fareast.corp.microsoft.com> Co-authored-by: Daniel Shao <superdainiu@Daniels-Mac.local>	2023-03-10 13:21:05 +08:00
Xuanlei Zhao	10c61de2f7	[autochunk] support vit (#3084 ) support vit for autochunk * support some new ops for vit * fix some bugs * add test for vit	2023-03-10 10:23:26 +08:00
YuliangLiu0306	8e4e8601b7	[DTensor] implement layout converter (#3055 ) * [DTensor] refactor LayoutConverter for DTensor * polish code * polish docstring	2023-03-10 09:53:52 +08:00
Xuanlei Zhao	2ca9728cbb	[autochunk] refactor chunk memory estimation (#2762 ) * refact memory code * dont log free var memory * add memory align * update chunk target * update setting for new memory * finish test * update tracer * update typo * update test	2023-03-08 16:22:30 +08:00
YuliangLiu0306	29386a54e6	[DTensor] refactor CommSpec (#3034 )	2023-03-08 10:45:31 +08:00
YuliangLiu0306	4269196c79	[hotfix] skip auto checkpointing tests (#3029 ) * [hotfix] skip auto checkpointing tests * fix test name issue	2023-03-07 15:50:00 +08:00
YuliangLiu0306	cd2b0eaa8d	[DTensor] refactor sharding spec (#2987 ) * [autoparallel] refactor sharding spec * rename function name	2023-03-07 11:08:11 +08:00
YuliangLiu0306	e414e4092b	[DTensor] implementation of dtensor (#2946 ) * [DTensor] implementation of dtensor * test layout convert * polish	2023-03-01 16:34:58 +08:00
YuliangLiu0306	197d0bf4ed	[autoparallel] apply repeat block to reduce solving time (#2912 )	2023-02-28 11:03:30 +08:00
YuliangLiu0306	819e25d8b1	[hotfix] fix autoparallel compatibility test issues (#2754 )	2023-02-23 17:28:36 +08:00
YuliangLiu0306	0f392d7403	[autoparallel] find repeat blocks (#2854 ) * [autoparallel] find repeat blocks * polish * polish * polish	2023-02-23 17:28:19 +08:00
Boyuan Yao	c7764d3f22	[autoparallel] Patch meta information of `torch.where` (#2822 ) * [autoparallel] patch meta information of torch.where * [autoparallel] pre-commit modified	2023-02-22 10:28:21 +08:00
Boyuan Yao	fcc4097efa	[autoparallel] Patch meta information of `torch.tanh()` and `torch.nn.Dropout` (#2773 ) * [autoparallel] tanh meta information * [autoparallel] remove redundant code * [autoparallel] patch meta information of torch.nn.Dropout	2023-02-22 10:27:59 +08:00
Boyuan Yao	7ea6bc7f69	[autoparallel] Patch tensor related operations meta information (#2789 ) * [autoparallel] tensor related meta information prototype * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information	2023-02-20 17:38:55 +08:00
HELSON	56ddc9ca7a	[hotfix] add correct device for fake_param (#2796 )	2023-02-17 15:29:07 +08:00
Boyuan Yao	a2b43e393d	[autoparallel] Patch meta information of `torch.nn.Embedding` (#2760 ) * [autoparallel] embedding metainfo * [autoparallel] fix function name in test_activation_metainfo * [autoparallel] undo changes in activation metainfo and related tests	2023-02-17 10:39:48 +08:00
YuliangLiu0306	1dc003c169	[autoparallel] distinguish different parallel strategies (#2699 )	2023-02-15 22:28:28 +08:00
YuliangLiu0306	21d6a48f4d	[autoparallel] add shard option (#2696 ) * [autoparallel] add shard option * polish	2023-02-15 13:48:28 +08:00
YuliangLiu0306	cb2c6a2415	[autoparallel] refactor runtime pass (#2644 ) * [autoparallel] refactor runtime pass * add unit test * polish	2023-02-15 10:36:19 +08:00
YuliangLiu0306	0b2a738393	[autoparallel] remove deprecated codes (#2664 )	2023-02-15 09:54:32 +08:00
YuliangLiu0306	7fa6be49d2	[autoparallel] test compatibility for gemini and auto parallel (#2700 )	2023-02-15 09:43:29 +08:00
Boyuan Yao	40c916b192	[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` (#2674 ) * [autoparallel] softmax metainfo * [autoparallel] softmax metainfo	2023-02-13 16:09:22 +08:00
HELSON	8213f89fd2	[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671 )	2023-02-13 14:35:32 +08:00
Boyuan Yao	0385b26ebf	[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647 ) * [autoparallel] layernorm metainfo patch * [autoparallel] polish test	2023-02-10 14:29:24 +08:00
YuliangLiu0306	37df666f38	[autoparallel] refactor handlers which reshape input tensors (#2615 ) * [autoparallel] refactor handlers which reshape input tensors * polish	2023-02-08 15:02:49 +08:00
YuliangLiu0306	cb3d1bef62	[autoparallel] adapt autoparallel tests with latest api (#2626 )	2023-02-08 15:02:12 +08:00
Boyuan Yao	90a9fdd91d	[autoparallel] Patch meta information of `torch.matmul` (#2584 ) * [autoparallel] matmul metainfo * [auto_parallel] remove unused print * [tests] skip test_matmul_handler when torch version is lower than 1.12.0	2023-02-08 11:05:31 +08:00

1 2 3 4 5 ...

741 Commits (5134ad5d1abf95fe63a72452953f894b9630ea93)