Commit Graph

949 Commits (07ed155e8633e7aaaea83a0dee3da04119287a9a)

Author SHA1 Message Date
YuliangLiu0306 0b2a738393
[autoparallel] remove deprecated codes (#2664) 2023-02-15 09:54:32 +08:00
YuliangLiu0306 7fa6be49d2
[autoparallel] test compatibility for gemini and auto parallel (#2700) 2023-02-15 09:43:29 +08:00
Boyuan Yao 40c916b192
[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` (#2674)
* [autoparallel] softmax metainfo

* [autoparallel] softmax metainfo
2023-02-13 16:09:22 +08:00
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671) 2023-02-13 14:35:32 +08:00
Boyuan Yao 0385b26ebf
[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647)
* [autoparallel] layernorm metainfo patch

* [autoparallel] polish test
2023-02-10 14:29:24 +08:00
YuliangLiu0306 37df666f38
[autoparallel] refactor handlers which reshape input tensors (#2615)
* [autoparallel] refactor handlers which reshape input tensors

* polish
2023-02-08 15:02:49 +08:00
YuliangLiu0306 cb3d1bef62
[autoparallel] adapt autoparallel tests with latest api (#2626) 2023-02-08 15:02:12 +08:00
Boyuan Yao 90a9fdd91d
[autoparallel] Patch meta information of `torch.matmul` (#2584)
* [autoparallel] matmul metainfo

* [auto_parallel] remove unused print

* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2023-02-08 11:05:31 +08:00
oahzxl 6ba8364881
[autochunk] support diffusion for autochunk (#2621)
* add alphafold benchmark

* renae alphafold test

* rename tests

* rename diffuser

* renme

* rename

* update transformer

* update benchmark

* update benchmark

* update bench memory

* update transformer benchmark

* rename

* support diffuser

* support unet metainfo prop

* fix bug and simplify code

* update linear and support some op

* optimize max region search, support conv

* update unet test

* support some op

* support groupnorm and interpolate

* update flow search

* add fix dim in node flow

* fix utils

* rename

* support diffusion

* update diffuser

* update chunk search

* optimize imports

* import

* finish autochunk
2023-02-07 16:32:45 +08:00
oahzxl c4b15661d7
[autochunk] add benchmark for transformer and alphafold (#2543) 2023-02-02 15:06:43 +08:00
oahzxl 05671fcb42
[autochunk] support multi outputs chunk search (#2538)
Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy.

1. rewrite search strategy to support multi outputs chunk search
2. fix many, many bugs
3. update tests
2023-02-01 13:18:51 +08:00
oahzxl 63199c6687
[autochunk] support transformer (#2526) 2023-01-31 16:00:06 +08:00
Frank Lee b55deb0662
[workflow] only report coverage for changed files (#2524)
* [workflow] only report coverage for changed files

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file
2023-01-30 21:28:27 +08:00
HELSON b528eea0f0
[zero] add zero wrappers (#2523)
* [zero] add zero wrappers

* change names

* add wrapper functions to init
2023-01-29 17:52:58 +08:00
HELSON 077a5cdde4
[zero] fix gradient clipping in hybrid parallelism (#2521)
* [zero] fix gradient clipping in hybrid parallelism

* [testing] change model name to avoid pytest warning

* [hotfix] fix unit testing
2023-01-29 15:09:57 +08:00
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
oahzxl c04f183237
[autochunk] support parsing blocks (#2506) 2023-01-20 11:18:17 +08:00
oahzxl 72341e65f4
[auto-chunk] support extramsa (#3) (#2504) 2023-01-20 10:13:03 +08:00
oahzxl ecccc91f21
[autochunk] support autochunk on evoformer (#2497) 2023-01-19 11:41:00 +08:00
HELSON d565a24849
[zero] add unit testings for hybrid parallelism (#2486) 2023-01-18 10:36:10 +08:00
oahzxl 4953b4ace1
[autochunk] support evoformer tracer (#2485)
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2023-01-16 19:25:05 +08:00
YuliangLiu0306 67e1912b59
[autoparallel] support origin activation ckpt on autoprallel system (#2468) 2023-01-16 16:25:13 +08:00
HELSON 21c88220ce
[zero] add unit test for low-level zero init (#2474) 2023-01-15 10:42:01 +08:00
HELSON a5dc4253c6
[zero] polish low level optimizer (#2473) 2023-01-13 14:56:17 +08:00
Jiarui Fang 867c8c2d3a
[zero] low level optim supports ProcessGroup (#2464) 2023-01-13 10:05:58 +08:00
YuliangLiu0306 8221fd7485
[autoparallel] update binary elementwise handler (#2451)
* [autoparallel] update binary elementwise handler

* polish
2023-01-12 09:35:10 +08:00
HELSON 5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
* [ddp] add is_ddp_ignored

[ddp] rename to is_ddp_ignored

* [zero] fix state_dict and load_state_dict

* fix bugs

* [zero] update unit test for ZeroDDP
2023-01-11 14:55:41 +08:00
YuliangLiu0306 41429b9b28
[autoparallel] add shard option (#2423) 2023-01-11 13:40:33 +08:00
HELSON bb4e9a311a
[zero] add inference mode and its unit test (#2418) 2023-01-11 10:07:37 +08:00
oahzxl 61fdd3464a update doc 2023-01-10 12:29:09 +08:00
oahzxl 36ab2cb783 change import 2023-01-10 12:20:40 +08:00
oahzxl 7ab2db206f adapt new fx 2023-01-10 11:56:00 +08:00
oahzxl e532679c95 Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk 2023-01-10 11:29:01 +08:00
oahzxl c1492e5013 add test in import 2023-01-10 11:20:28 +08:00
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
* [gemini] polish code

* [testing] remove code

* [gemini] make more robust
2023-01-09 17:41:38 +08:00
oahzxl 212b5b1b5f add comments 2023-01-09 16:29:33 +08:00
oahzxl aafc3516a5 add available 2023-01-09 15:32:19 +08:00
oahzxl d5c4f0bf95 code style 2023-01-09 15:22:09 +08:00
oahzxl d106b271f8 add chunk search test 2023-01-09 15:19:08 +08:00
oahzxl a005965d2d update codegen test 2023-01-09 14:57:47 +08:00
oahzxl 3abbaf8bc6 update codegen test 2023-01-09 14:53:04 +08:00
oahzxl 74b81395a2 update codegen test 2023-01-09 14:26:22 +08:00
oahzxl 18a51c87fe rename test 2023-01-09 14:20:54 +08:00
oahzxl cb68ee864a set benchmark 2023-01-09 14:20:41 +08:00
Jiarui Fang 4e96039649
[device] find best logical mesh 2023-01-07 14:04:30 +08:00
Frank Lee 40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
* [setup] support pre-build and jit-build of cuda kernels

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-06 20:50:26 +08:00
oahzxl a6cdbf9161 seperate trace flow 2023-01-06 17:24:23 +08:00
oahzxl da4076846d rename 2023-01-06 17:09:37 +08:00
oahzxl fd87d78a28 rename ambiguous variable 2023-01-06 14:28:04 +08:00
oahzxl 8a634af2f5 close mem and code print 2023-01-06 14:19:45 +08:00
oahzxl 1a6d2a740b take apart chunk code gen 2023-01-06 14:14:45 +08:00
HELSON 48d33b1b17
[gemini] add get static torch model (#2356) 2023-01-06 13:41:19 +08:00
oahzxl d1f0773182 rename 2023-01-06 11:48:33 +08:00
oahzxl 06a5355d98 update test 2023-01-06 11:44:01 +08:00
oahzxl efb1c64c30 restruct dir 2023-01-06 11:39:26 +08:00
YuliangLiu0306 b5a3a4a65f [device] find best logical mesh 2023-01-05 17:21:29 +08:00
YuliangLiu0306 9c9246c0d9
[device] alpha beta profiler (#2311)
* [device] alpha beta profiler

* add usage

* fix variable name
2023-01-05 16:39:55 +08:00
Jiarui Fang db6eea3583
[builder] reconfig op_builder for pypi install (#2314) 2023-01-04 16:32:32 +08:00
HELSON 5d3a2be3af
[amp] add gradient clipping for unit tests (#2283)
* [amp] add gradient clipping in unit tests

* fix bugs
2023-01-04 11:59:56 +08:00
zbian e94c79f15b improved allgather & reducescatter for 3d 2023-01-03 17:46:08 +08:00
YuliangLiu0306 fb87322773
[autoparallel] fix spelling error (#2270) 2023-01-03 16:13:00 +08:00
YuliangLiu0306 8897b8f753
[autoparallel] autoparallel initialize (#2238) 2022-12-31 01:02:14 +08:00
YuliangLiu0306 3b1b91eaf4
[autoparallel] record parameter attribute in colotracer (#2217)
* [autoparallel] record parameter attribute in collotracer

* [autoparallel] fix construct_meta_info bug
2022-12-28 19:29:08 +08:00
Boyuan Yao 24246f7aa5
[autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print

* [autoparallel] add F.conv metainfo

* [autoparallel] linear fix

* [autoparallel] memory estimation for communication actions

* [autoparallel] fix docstring

* [autoparallel] fix variables name

* [autoparallel] attach tensor to metainfo class

* [autoparallel] fix dangerous try except

* [autoparallel] attach memory cost to shape consistency node

* [autoparallel] attach shape consistency node's metainfo to the node

* [autoparallel] remove todo in shape consistency memory estimation

* [autoparallel] fix the annotation
2022-12-28 13:37:40 +08:00
YuliangLiu0306 78509124d3
[autoparallel] update getitem handler (#2207) 2022-12-27 19:58:32 +08:00
YuliangLiu0306 4851f2d607
[autoparallel] update_getattr_handler (#2193) 2022-12-26 21:57:39 +08:00
YuliangLiu0306 f10ce01e31
[autoparallel] add gpt2 performance test code (#2194) 2022-12-26 21:56:58 +08:00
HELSON a3100bd50d
[testing] add beit model for unit testings (#2196)
* [testing] add beit model

* [beit] fix bugs

* [beit] fix bugs

* [testing] fix bugs
2022-12-26 17:35:36 +08:00
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
* [zero] fix error for BEiT models

* [ColoParameter] add unpack operation for tuple arguments

* fix bugs

* fix chunkv2 unit testing

* add assertion for gradient state
2022-12-26 15:03:54 +08:00
Jiarui Fang 355ffb386e
[builder] unified cpu_optim fused_optim inferface (#2190) 2022-12-23 20:57:41 +08:00
Jiarui Fang 9587b080ba
[builder] use runtime builder for fused_optim (#2189) 2022-12-23 17:07:03 +08:00
Jiarui Fang bc0e271e71
[buider] use builder() for cpu adam and fused optim in setup.py (#2187) 2022-12-23 16:05:13 +08:00
Jiarui Fang d42afd30f8
[builder] runtime adam and fused_optim builder (#2184) 2022-12-23 14:14:21 +08:00
YuliangLiu0306 550f8f8905
[autoparallel] integrate_gpt_related_tests (#2134)
* [autoparallel] integrate_gpt_related_tests

* polish code

* polish code

* add GPT2Model into runtime test
2022-12-23 12:36:59 +08:00
Jiarui Fang 27327a4c90
[example] add palm pytorch version (#2172) 2022-12-22 10:15:34 +08:00
Jiarui Fang b87496a66b
[hotfix] fix auto policy of test_sharded_optim_v2 (#2157) 2022-12-20 23:03:18 +08:00
YuliangLiu0306 16335cb537
[hotfix] fix aten default bug (#2158) 2022-12-20 22:40:46 +08:00
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151) 2022-12-20 10:19:36 +08:00
アマデウス 077a66dd81
updated attention kernel (#2133) 2022-12-16 10:54:03 +08:00
YuliangLiu0306 536560ccc0
[autoparallel] implement softmax handler (#2132) 2022-12-14 16:09:53 +08:00
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129) 2022-12-14 00:47:06 +08:00
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128) 2022-12-13 17:11:31 +08:00
Jiarui Fang deee317b0f
[Gemini] test step-tensor mapping using repeated_computed_layers.py (#2127) 2022-12-13 16:34:10 +08:00
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126) 2022-12-13 15:44:07 +08:00
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125) 2022-12-13 14:14:55 +08:00
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124) 2022-12-13 12:50:24 +08:00
YuliangLiu0306 cd0af9f7f6
[autoparallel] gpt2lp runtimee test (#2113) 2022-12-12 18:06:40 +08:00
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115) 2022-12-12 18:06:16 +08:00
HELSON e7d3afc9cc
[optimizer] add div_scale for optimizers (#2117)
* [optimizer] add div_scale for optimizers

* [zero] use div_scale in zero optimizer

* fix testing error
2022-12-12 17:58:57 +08:00
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119) 2022-12-12 16:57:22 +08:00
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116) 2022-12-12 15:39:31 +08:00
Ziyue Jiang 09d69e1c25
[PP Middleware] Add bwd and step for PP middleware (#2111)
* add bwd and step for PP middleware

* pre-commit

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-12 12:40:03 +08:00
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
* [zero] add L2 gradient clipping

* [testing] add MlpModel

* [zero] add unit test for grad clipping

* fix atol
2022-12-09 18:09:17 +08:00
Jiarui Fang 70a8556946
[gemini] get the param visited order during runtime (#2108) 2022-12-09 16:13:03 +08:00
YuliangLiu0306 d87baa85d9
[autoparallel] support linear function bias addition (#2104) 2022-12-09 10:31:36 +08:00
YuliangLiu0306 0fecbb9e20
[autoparallel] support addbmm computation (#2102) 2022-12-08 21:15:11 +08:00
YuliangLiu0306 d3d4630495
[autoparallel] add sum handler (#2101) 2022-12-08 17:02:54 +08:00
Ziyue Jiang e4705ba4e2
[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087)
* add DAG test case

* fix datarace by adjusting theposition of lock

* polish code

* fix pytest for middleware

* remove test

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-08 13:32:27 +08:00
YuliangLiu0306 b175e6d58e
[autoparallel] add bias addtion function class (#2098)
* [autoparallel] add bias addtion function class

* polish code

* polish
2022-12-08 11:31:51 +08:00
YuliangLiu0306 3af7e65dea
[autoparallel] complete gpt related module search (#2097) 2022-12-08 10:04:09 +08:00
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099) 2022-12-07 23:04:02 +08:00
Jiarui Fang 978242326a
[Gemini] remove eval in gemini unittests! (#2092) 2022-12-07 11:58:37 +08:00
YuliangLiu0306 7f72eb0510
[autoparallel]add embedding handler (#2089)
* [autoparallel] add embedding handler

* fix bugs
2022-12-07 09:41:46 +08:00
Jiarui Fang 1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091) 2022-12-06 22:30:16 +08:00
Jiarui Fang 25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer (#2088) 2022-12-06 19:48:20 +08:00
Jiarui Fang 33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) 2022-12-06 16:43:06 +08:00
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083) 2022-12-06 12:53:58 +08:00
YuliangLiu0306 0e9db368ef
[autoparallel] add tensor constructor handler (#2082) 2022-12-06 10:20:10 +08:00
YuliangLiu0306 cdf537a648
[autoparallel] add non_split linear strategy (#2078)
* [autoparallel] add non_split linear stategy

* polish
2022-12-06 10:19:33 +08:00
Boyuan Yao cf0268da93
[autoparallel] Add F.conv metainfo (#2069)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print

* [autoparallel] add F.conv metainfo

* [autoparallel] linear fix
2022-12-06 10:17:57 +08:00
YuliangLiu0306 f123476666
[autoparallel] complete gpt block searching (#2065)
* [autoparallel] complete gpt block searching

* fix test
2022-12-06 10:17:10 +08:00
Ziyue Jiang 597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo (#2066)
* adapt scheduler for Topo

* remoove comment

* fix set input

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-05 20:23:41 +08:00
Jiarui Fang 4f21c9e8d9
[Gemini] polish runtime tracer tests (#2077) 2022-12-05 16:22:49 +08:00
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076) 2022-12-05 15:00:03 +08:00
Jiarui Fang 40b7d55bf3
[Gemini] add albert in test models. (#2075) 2022-12-05 14:09:34 +08:00
Jiarui Fang 616ed91ecd
[test] bert test in non-distributed way (#2074) 2022-12-05 13:32:16 +08:00
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) 2022-12-05 12:45:11 +08:00
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072) 2022-12-05 11:57:59 +08:00
Boyuan Yao 616da17fab
[autoparallel] add binary elementwise metainfo for auto parallel (#2058)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print
2022-12-04 15:18:51 +08:00
Ziyue Jiang 44ea461890
[Pipeline] Add Topo Class (#2059)
* use Topo class to rewrite DAG

* polish code

* polish code

* polish code

* add comment

* add else to unended if

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-02 18:13:20 +08:00
YuliangLiu0306 e4293e5077
[hotfix] update test for latest version (#2060) 2022-12-02 18:12:30 +08:00
YuliangLiu0306 19438ea0ef
[hotfix] skip gpt tracing test (#2064) 2022-12-02 16:48:28 +08:00
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052) 2022-12-02 16:04:19 +08:00
YuliangLiu0306 1c1fe44305
[autoparallel] adapt solver with self attention (#2037)
* [autoparallel] adapt solver with self attention

* polish code
2022-12-01 17:53:15 +08:00
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
* [gemini] fix init bugs for modules

* fix bugs
2022-11-30 17:06:10 +08:00
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) 2022-11-30 15:57:45 +08:00
Jiarui Fang 1e885329f4
[test] align model name with the file name. (#2045) 2022-11-30 15:45:26 +08:00
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043) 2022-11-30 14:53:41 +08:00
HELSON 384cd26314
[zero] fix testing parameters (#2042) 2022-11-30 12:09:32 +08:00
HELSON 17a3c685b0
[zero] fix unit-tests (#2039) 2022-11-30 10:40:31 +08:00
Jiarui Fang eb7742a4bb
[Gemini] more tests for Gemini (#2038)
* [Gemini] more tests for Gemini

* polish code
2022-11-29 17:13:10 +08:00
HELSON 537e181705
[testing] fix testing models (#2036)
* [testing] fix testing models

* roll back
2022-11-29 13:42:06 +08:00
HELSON a1ce02d740
[zero] test gradient accumulation (#1964)
* [zero] fix memory leak for zero2

* [zero] test gradient accumulation

* [zero] remove grad clip test
2022-11-29 13:00:30 +08:00
Ziyue Jiang b0936e4a44
[rpc] split with dag (#2028)
* add DAG to split_module

* add comment

* add test case for DAG

* remove print

* add DAG middleware in scheduler

* add test case for scheduler

* remove break

* recover old lifecycle

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-29 11:36:28 +08:00
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035) 2022-11-29 11:19:52 +08:00
YuliangLiu0306 0dbcd4a6f5
[autoparallel] add split handler (#2032)
* [autoparallel] add split handler

* add numerical test and runtime passes
2022-11-29 11:03:51 +08:00
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034) 2022-11-29 09:26:06 +08:00
YuliangLiu0306 81330b0352
[autoparallel] add experimental permute handler (#2029) 2022-11-27 20:26:52 +08:00
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030) 2022-11-26 13:30:24 +08:00
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) 2022-11-25 20:06:35 +08:00
Ziyue Jiang 632753abbc
[fx]Split partition with DAG information (#2025)
* add DAG to split_module

* add comment

* add test case for DAG

* remove print

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-25 17:42:48 +08:00
YuliangLiu0306 ea0f6b8df9
[autoparallel] add runtime pass and numerical test for view handler (#2018) 2022-11-25 15:50:16 +08:00
Jiarui Fang 2e9cbfca12
[Gemini] add unitests to check gemini correctness (#2015) 2022-11-24 16:51:45 +08:00
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016) 2022-11-24 15:28:58 +08:00
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009) 2022-11-24 14:40:33 +08:00
YuliangLiu0306 1438993113
[autoparallel] add experimental view handler (#2011)
* [autoparallel] add experimental view handler

* polish

* polish

* polish code

* rename variables
2022-11-24 11:34:41 +08:00
Genghan Zhang d655eea515
[autoparallel] mix gather (#1977)
* Add mix-gather

* Add comments

* Add comments

* Polish comments

* Change the global rank assumption

* Add tests

* Add two-step tests

* Fix 10 and 01

* Skip test becasue the number of GPUs
2022-11-23 21:49:17 +08:00
Jiarui Fang 3d907faede
[Gemini] add an inline_op_module to common test models and polish unitests. (#2004) 2022-11-23 16:55:54 +08:00
Boyuan Yao 6cd784ffee
[autoparallel] Add metainfo support for F.linear (#1987)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator
2022-11-23 14:12:34 +08:00
YuliangLiu0306 35e6b9ec82
[autoparallel] adapt handlers with attention block (#1990)
* [autoparallel] adapt handlers with attention block

* polish
2022-11-21 10:44:11 +08:00
Jiarui Fang 5bec3b2168
[Gemini] open grad checkpoint when model building (#1984) 2022-11-18 16:32:54 +08:00
Boyuan Yao c26f21d365
[autoparallel] add pooling metainfo (#1968)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo
2022-11-18 15:13:03 +08:00
Jiarui Fang 3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests (#1982) 2022-11-18 14:58:28 +08:00
Jiarui Fang e481489aa6
[Gemini] MemtracerWrapper unittests (#1981) 2022-11-18 14:19:40 +08:00
YuliangLiu0306 0da1d00399
[autoparallel] support distributed dataloader option (#1906)
* [autoparallel] support distributed dataloader option

* update output handler to support ddp dataloader

* poish code
2022-11-17 20:11:53 +08:00
Genghan Zhang 6630d45546
[autoparallel] Add alpha beta (#1973)
* Add alpha beta

* Fix test

* Fix test
2022-11-17 16:01:14 +08:00
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971) 2022-11-17 13:42:33 +08:00
Boyuan Yao 7c7921f71b
[autoparallel] add torch.nn.ReLU metainfo (#1868)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input
2022-11-16 23:12:31 +08:00
YuliangLiu0306 fea3cb661c
[autoparallel] support addmm in tracer and solver (#1961)
* [fx] patch addmm

* [autoparallel] support addmm in tracer and solver
2022-11-16 14:59:18 +08:00
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960) 2022-11-16 14:44:28 +08:00
HELSON 7066dfbf82
[zero] fix memory leak for zero2 (#1955) 2022-11-16 11:43:24 +08:00
Jiarui Fang 52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) 2022-11-15 16:24:16 +08:00
zbian 6877121377 updated flash attention api 2022-11-15 15:25:39 +08:00
Jiarui Fang 9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) 2022-11-14 16:05:09 +08:00
HELSON 6e51d296f0
[zero] migrate zero1&2 (#1878)
* add zero1&2 optimizer

* rename test ditectory

* rename test files

* change tolerance in test
2022-11-11 09:26:40 +08:00
Jiarui Fang 51597f6a28
[hotfix] pass test_complete_workflow (#1877) 2022-11-10 17:53:39 +08:00
Jiarui Fang 986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) 2022-11-10 17:36:42 +08:00
YuliangLiu0306 1b494ad73c
[autoparallel] fix linear logical convert issue (#1857) 2022-11-10 17:19:22 +08:00
Jiarui Fang c2947dadf1
[inference] streaming Linear 1D Row inference (#1874) 2022-11-10 17:03:21 +08:00
xcnick a141681260
[amp] add torch amp test (#1860) 2022-11-10 16:40:26 +08:00
Frank Lee e6ec99d389
[utils] fixed lazy init context (#1867) 2022-11-10 15:17:20 +08:00
Jiarui Fang 3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext (#1844) 2022-11-09 11:50:33 +08:00
YuliangLiu0306 f6032ddb17
[autoparallel] fix bias addition module (#1800) 2022-11-08 16:21:25 +08:00
ver217 99870726b1
[CheckpointIO] a uniform checkpoint I/O module (#1689) 2022-11-08 15:15:13 +08:00
Boyuan Yao 629172b319
[autoparallel] add batch norm metainfo (#1815)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler
2022-11-08 15:05:26 +08:00
Super Daniel 441d584e4a
[fx] add a symbolic_trace api. (#1812)
* [fx] add a symbolic_trace api.

* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
Jiarui Fang 6fa71d65d3
[fx] skip diffusers unitest if it is not installed (#1799) 2022-11-08 11:45:23 +08:00
oahzxl 9639ea88fc
[kernel] more flexible flashatt interface (#1804) 2022-11-07 17:02:09 +08:00
Boyuan Yao 327d07c44a
[autoparallel] add conv metainfo class for auto parallel (#1796)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test
2022-11-07 16:15:35 +08:00
oahzxl 501a9e9cd2
[hotfix] polish flash attention (#1802) 2022-11-07 14:30:22 +08:00
Jiarui Fang c248800359
[kernel] skip tests of flash_attn and triton when they are not available (#1798) 2022-11-07 13:41:13 +08:00
YuliangLiu0306 e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands (#1793) 2022-11-04 18:36:42 +08:00
Boyuan Yao 05ce3d369f
[fx] Add linear metainfo class for auto parallel (#1783)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
YuliangLiu0306 2c4c7b3618
[autoparallel] add getattr handler (#1767)
* [autoparallel] add getattr haandler

* polish code

* add extra processes for Parameters

* add unit test for param resharding cost

* add docstring and polish test
2022-11-03 12:31:33 +08:00
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
Jiarui Fang 32c1b843a9
skip torchrec unittests if not installed (#1790) 2022-11-02 14:44:32 +08:00
kurisusnowdeng 0b8161fab8 updated tp layers 2022-11-02 12:19:38 +08:00
YuliangLiu0306 e859380bf7
[fx] support module with bias addition (#1780)
* [autoparallel] refactor tracer to fix bias addition issue

* [fx] support module with bias addition

* create bias_addition_module

* refactor file structure

* polish code

* fix unit test
2022-11-01 22:53:51 +08:00
Frank Lee f3f19a5c47
[autoparallel] added matmul handler (#1763)
* [autoparallel] added matmul handler

* polish code
2022-11-01 15:14:53 +08:00
YuliangLiu0306 27de252334
[autoparallel] fix conv handler numerical test (#1771) 2022-11-01 10:43:44 +08:00
Super Daniel 1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764)
* [autoparallel] first move.

* [autoparallel] add solver rotor.

* [autoparallel] add ckpt solvers.

* [autoparallel] modify codegen.

* [fx] fix annotation in test.

* [fx] remove check.

* [autoparallel] polish docstring.

* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
YuliangLiu0306 a4d1f59c78
[autoparallel] add numerical test for handlers (#1769) 2022-10-28 10:59:59 +08:00
YuliangLiu0306 b0f7c8bde8
[autoparallel] update CommSpec to CommActions (#1768)
* [autoparallel] update CommSpec to CommActions

* polish code
2022-10-28 09:57:43 +08:00
YuliangLiu0306 b4cc59b61e
[autoparallel] add numerical test for node strategies (#1760)
* [autoparallel] add numerical test for node strategies

* polish code

* polish code
2022-10-27 10:42:54 +08:00
oahzxl 25952b67d7
[feat] add flash attention (#1762) 2022-10-26 16:15:52 +08:00
Super Daniel 0584654c79
[fx] refactor memory utils and extend shard utils. (#1754)
* [fx] change memory.py to memory_utils.py.

* [fx] add shard utils.

* [fx] fix import.

* [fx] check code style.

* [fx] add comment.

* [autoparallel] first move.

* [fx] add time computations.
2022-10-26 14:24:41 +08:00
YuliangLiu0306 314d8c497f
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757)
* [autoparallel] refactor the runtime apply pass and add doc string to passes

* fix unit test

* polish
2022-10-25 14:32:22 +08:00
Frank Lee f9a613d660
[autoparallel] added binary elementwise node handler (#1758)
* [autoparallel] added binary elementwise node handler

* polish code
2022-10-25 14:32:01 +08:00
YuliangLiu0306 d2fc067231
[autoparallel] fix param hook issue in transform pass (#1755) 2022-10-24 13:13:38 +08:00
Frank Lee 262652c8bc
[autoparallel] added addbmm handler (#1751) 2022-10-21 18:55:48 +08:00
YuliangLiu0306 980ed21723
[autoparallel] shard param and buffer as expected (#1753)
* [autoparallel] shard param and buffer as expected

* fix unit test issue
2022-10-21 15:45:13 +08:00
YuliangLiu0306 cdb7d5e7d2
[hotfix] autoparallel unit test (#1752) 2022-10-20 19:51:38 +08:00
YuliangLiu0306 a4ce180e85
[autoparallel] add sequential order to communication actions (#1735) 2022-10-20 18:48:18 +08:00
Super Daniel b893342f95
[fx] test tracer on diffuser modules. (#1750)
* [fx] test tracer on diffuser modules.

* [fx] shorter seq_len.

* Update requirements-test.txt
2022-10-20 18:25:05 +08:00
Frank Lee b80b6eaa88
[autoparallel] recovered skipped test cases (#1748) 2022-10-20 16:37:33 +08:00
Frank Lee 474111ecb5
[autoparallel] fixed wrong sharding strategy in conv handler (#1747)
* [autoparallel] fixed wrong sharding strategy in conv handler

* polish code
2022-10-20 16:12:39 +08:00
Frank Lee 8b8937d901
[autoparallel] fixed wrong generated strategy for dot op (#1746)
* [autoparallel] fixed wrong generated strategy for dot op

* polish code
2022-10-20 15:18:16 +08:00
Frank Lee 88a79814fb
[autoparallel] handled illegal strategy in node handler (#1743)
* [autoparallel] handled illegal strategy in node handler

* polish code
2022-10-19 17:08:52 +08:00
Super Daniel 30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730)
* [fx/profiler] add test.

* [fx] fix file names.

* [fx] add docstring and comment.

* [fx] polish profiler.py.

* [fx] fix import errors.

* [fx] fix profiler.

* [fx] fix names.
2022-10-19 14:24:51 +08:00
Frank Lee eee84908d4
[autoparallel] handled illegal sharding strategy (#1728)
* [autoparallel] handled illegal sharding strategy

* polish code
2022-10-19 12:53:06 +08:00
Ziheng Qin cbe9a4cb45 [NFC] polish tests/test_layers/test_3d/test_3d.py code style (#1740) 2022-10-19 12:20:51 +08:00
lucasliunju 912eb58ea0 [NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style (#1733) 2022-10-19 12:20:51 +08:00
Xue Fuzhao 754aa7c81f [NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style (#1731) 2022-10-19 12:20:51 +08:00
xyupeng ff373a11eb [NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style (#1723) 2022-10-19 12:20:51 +08:00
Kai Wang (Victor Kai) b38efe4e8a [NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718) 2022-10-19 12:20:51 +08:00
binmakeswell f6389d0813 [NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) 2022-10-19 12:20:51 +08:00
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
* add chunk manager init function

* fix unit tests

* add comment

* add flush=True
2022-10-18 16:31:22 +08:00
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
* [fx] move meta registration

* [fx] fix tests.

* [fx] fix test.

* [fx] fix.

* [meta] refactor meta registration.py.

* [fx] add compatibility descriptions.

* [fx] polish import.

* [fx] add a decorator.

* [fx] fix tests.

* [fx] remove print.

* [fx] edit raise error.

* [fx] edit raise error.

* [fx] add type hint.

* [fx] fix import in experimental.

* [rpc] remove color debug.

* [meta] fix naming.
2022-10-18 10:44:23 +08:00
Frank Lee e8d8eda5e7
[autoparallel] moved tests to test_tensor_shard (#1713) 2022-10-17 13:54:20 +08:00
YuliangLiu0306 845ff4a47a
[autoparallel] resnet block runtime apply (#1709)
* [autoparallel] resnet block runtime apply

* seperate buffer and parameter in MemoryCost

* polish code

* add comments and todos

* fix test issue
2022-10-17 13:37:38 +08:00
Frank Lee 22a115406b
[autoparallel] fixed broken node handler tests (#1708) 2022-10-14 18:25:59 +08:00
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
Frank Lee 6c331a5a09
[autoparallel] refactored the autoparallel module for organization (#1706)
* [autoparallel] refactored the autoparallel module for organization

* polish code
2022-10-14 13:27:00 +08:00
Frank Lee 91cd34e6e0
[unittest] added doc for the pytest wrapper (#1704) 2022-10-14 10:56:17 +08:00
YuliangLiu0306 451cd72dea
[autoparallel] adapt runtime passes (#1703)
* [autoparallel] adapt runtime passes v2

* polish code
2022-10-14 10:14:07 +08:00
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) 2022-10-13 22:22:27 +08:00
Frank Lee 0e52f3d3d5
[unittest] supported condititonal testing based on env var (#1701)
polish code
2022-10-13 19:38:45 +08:00
Frank Lee 8283e95db3
[autoparallel] collated all deprecated files (#1700)
* [autoparallel] collated all deprecated files

* polish code
2022-10-13 18:24:11 +08:00
YuliangLiu0306 81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler (#1695)
* [autoparallel] adapt solver and CostGraph with new handler

* fix test issue
2022-10-13 14:04:15 +08:00
YuliangLiu0306 42b882ef06
[autoparallel] add output handler and placeholder handler (#1694)
* [autoparallel] add output handler and placeholder handler

* Delete test_solver_with_resnet.py

* fix test bugs
2022-10-13 13:42:36 +08:00
YuliangLiu0306 56088e6d98
[autoparallel] add pooling handler (#1690)
* [autoparallel] add pooling handler

* polish code
2022-10-13 13:42:13 +08:00
YuliangLiu0306 319d654f79
[autoparallel] where_handler_v2 (#1688)
* where generator

* [autoparallel] where_handler_v2
2022-10-13 11:02:22 +08:00
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691) 2022-10-12 15:21:58 +08:00
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687) 2022-10-12 11:16:18 +08:00
YuliangLiu0306 af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) 2022-10-11 18:12:59 +08:00
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
* [fx/profiler] modify data_ptr into uuid for all tensors.

* [fx] modify uuid.

* [fx/profiler] tune performance on GPT-2.

* [fx] updates.

* [fx] debug.

* [fx] debug.

* [fx] cuda.
2022-10-11 11:03:35 +08:00
YuliangLiu0306 517b63939a
[autoparallel] add unary element wise handler v2 (#1674) 2022-10-09 17:30:42 +08:00
YuliangLiu0306 f6c6a932b8
[autoparallel] add following node generator (#1673)
* [autoparallel] add following node generator

* polish code

* polish code

* update name of arguments
2022-10-09 14:49:18 +08:00
YuliangLiu0306 52fda88796
[autoparallel] add layer norm handler v2 (#1671)
* [autoparallel] add layer norm handler v2

* polish code

* polish code
2022-10-09 14:23:22 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
* [autoparallel] add rotor c version

* [fx] remove metainfoprop in rotor solver

* [autoparallel] modify C
 code format

* [autoparallel] remove build.py

* [autoparallel] fix C extension build

* [autoparallel] add C solver consistency test

* [autoparallel] remove some unused imports

* [autoparallel] refactor rotor solver code

* [autoparallel] replace print with colossalai logger

* [autoparallel] ranks fixed
2022-10-03 17:13:30 +08:00
YuliangLiu0306 11ec070e53
[hotfix]unit test (#1670) 2022-09-29 12:49:28 +08:00
Frank Lee a60024e77a
[autoparallel] added utils for broadcast operation (#1665)
* [autoparallel] added utils for broadcast operation

* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667) 2022-09-29 11:20:59 +08:00
YuliangLiu0306 746f8f979d
[autoparallel] add batch norm handler v2 (#1666) 2022-09-29 11:02:49 +08:00
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
Frank Lee 3a4d6f63a8
[autoparallel] added node handler for bmm (#1655) 2022-09-28 11:32:16 +08:00
YuliangLiu0306 095854477f
[autoparallel] add conv handler v2 (#1663) 2022-09-28 11:24:59 +08:00
YuliangLiu0306 1e7816a460
[autoparallel] adapt solver with gpt (#1653) 2022-09-28 11:17:26 +08:00
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650) 2022-09-27 12:06:25 +08:00
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
* [autoparallel] change the following nodes strategies generation logic

* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
* [autoparallel] where handler

* fix unit test
2022-09-27 11:20:43 +08:00
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
* [fx] fix offload codegen test

* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639) 2022-09-26 16:58:14 +08:00
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
* [fix] fixed the collective pattern name for consistency

* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638) 2022-09-26 15:26:14 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2022-09-23 17:20:41 +08:00
HELSON a088022efc
[moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629) 2022-09-23 12:00:25 +08:00
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613) 2022-09-23 11:52:42 +08:00
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
2022-09-23 11:04:52 +08:00
YuliangLiu0306 9eae855408
[hotfix] add recompile after graph manipulatation (#1621) 2022-09-23 11:00:33 +08:00
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
2022-09-23 10:59:47 +08:00
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
* remove forced FP32 modules

* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Frank Lee d925122020
[autoparallel] added new linear module handler (#1616) 2022-09-21 12:23:21 +08:00
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
* [fx] PoC of runtime shape consistency application

* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306 47b11c432c
[autoparallel]add bcast matmul strategies (#1605) 2022-09-20 11:26:21 +08:00
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
YuliangLiu0306 eac1b79371
[autoparallel] add bcast op handler (#1600)
* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish
2022-09-16 11:33:01 +08:00
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00
YuliangLiu0306 faa23b9d9a
[autoparallel] add reshape handler (#1594)
* [autoparallel] add reshape handler

* polish code
2022-09-14 10:25:45 +08:00
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591)
* [autoparallel] refactored shape consistency to remove redundancy

* polish code

* polish code

* polish code
2022-09-13 18:30:18 +08:00
YuliangLiu0306 d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) 2022-09-13 18:05:05 +08:00
Frank Lee 219f66c571
[autoparallel] added solver option dataclass (#1588) 2022-09-13 14:47:09 +08:00
YuliangLiu0306 82d4376c23
[autoparallel] adapt solver with resnet (#1583)
* [autoparallel]adapt solver with resnet

* polish code

* polish code
2022-09-13 12:07:09 +08:00
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2022-09-13 10:50:34 +08:00
Boyuan Yao f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen (#1585)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix
2022-09-12 20:00:48 +08:00
アマデウス e615cfc3a8
[NFC] polish test component gpt code style (#1567) 2022-09-08 16:34:09 +08:00
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544) 2022-09-07 19:01:06 +08:00
Super Daniel 4f59693207
[fx] provide a stable but not accurate enough version of profiler. (#1547)
* [fx] compute memory stat and flop count for MetaInfoProp.

* [fx] modify node attribute.

* [fx] modify ckpt_chen.

* [fx] fix compatibility.

* [fx] fix import error.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip if torch 1.11.0.

* [fx] recover MetaInfoProp support for PyTorch 1.11.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix import error.
2022-09-07 11:21:04 +08:00
YuliangLiu0306 0908d0fc61
[autoparallel]add backward cost info into strategies (#1524) 2022-09-07 11:19:00 +08:00
YuliangLiu0306 44c866a3e3
[autoparallel] change the merge node logic (#1533) 2022-09-07 11:18:19 +08:00
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545) 2022-09-06 10:41:20 +08:00
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537) 2022-09-05 15:12:53 +08:00
Boyuan Yao 56159049e8
[fx] Modify solver linearize and add corresponding test (#1531)
* [fx] modify solver linearize and add test

* [fx] add torch11 test of linearize but skip it

* [fx] remove some unused imports
2022-09-02 10:24:41 +08:00
Super Daniel 7dc53237c3
[fx] add test for meta tensor. (#1527)
* [fx] add test for meta tensor.

* [fx] add test for meta tensor.

* [fx] add test for meta tensor.

* [fx] add test for meta tensor.

* [fx] fix error.
2022-09-01 19:30:05 +08:00
YuliangLiu0306 4b3d6caeb3
[fx]patch nn.functional convolution (#1528) 2022-09-01 19:05:07 +08:00
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526) 2022-09-01 17:55:41 +08:00
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] skip process group test

* [pipeline/pipleline_process_group] remove test named function
2022-09-01 17:45:47 +08:00
Boyuan Yao b231430bcb
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521)
* [fx] fix wrong variable name in solver rotor

* [fx] fix wrong variable name in solver rotor

* [fx] fix the discretize bug

* [fx] fix the first op in activation checkpoint codegen

* [fx] fix some bugs of ckpt solver

* [fx] modify test_ckpt_torchvision

* [fx] set sequence to __sequence__ attr of GraphModule

* [fx] docstring modification

* [fx] remove performance test
2022-08-31 18:10:48 +08:00
YuliangLiu0306 3345c6d352
[autoparellel]add strategies constructor (#1505)
* [autoparellel]add strategies constructor

* remove duplicated strategies

* polish code

* adapt cost graph with StrategiesConstructor

* polish
2022-08-30 16:32:09 +08:00