Commit Graph

770 Commits (b29e1f07224298aea35aab7ee83284beac28e0d8)

Author SHA1 Message Date
アマデウス 077a66dd81
updated attention kernel (#2133) 2022-12-16 10:54:03 +08:00
YuliangLiu0306 536560ccc0
[autoparallel] implement softmax handler (#2132) 2022-12-14 16:09:53 +08:00
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129) 2022-12-14 00:47:06 +08:00
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128) 2022-12-13 17:11:31 +08:00
Jiarui Fang deee317b0f
[Gemini] test step-tensor mapping using repeated_computed_layers.py (#2127) 2022-12-13 16:34:10 +08:00
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126) 2022-12-13 15:44:07 +08:00
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125) 2022-12-13 14:14:55 +08:00
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124) 2022-12-13 12:50:24 +08:00
YuliangLiu0306 cd0af9f7f6
[autoparallel] gpt2lp runtimee test (#2113) 2022-12-12 18:06:40 +08:00
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115) 2022-12-12 18:06:16 +08:00
HELSON e7d3afc9cc
[optimizer] add div_scale for optimizers (#2117)
* [optimizer] add div_scale for optimizers

* [zero] use div_scale in zero optimizer

* fix testing error
2022-12-12 17:58:57 +08:00
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119) 2022-12-12 16:57:22 +08:00
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116) 2022-12-12 15:39:31 +08:00
Ziyue Jiang 09d69e1c25
[PP Middleware] Add bwd and step for PP middleware (#2111)
* add bwd and step for PP middleware

* pre-commit

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-12 12:40:03 +08:00
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
* [zero] add L2 gradient clipping

* [testing] add MlpModel

* [zero] add unit test for grad clipping

* fix atol
2022-12-09 18:09:17 +08:00
Jiarui Fang 70a8556946
[gemini] get the param visited order during runtime (#2108) 2022-12-09 16:13:03 +08:00
YuliangLiu0306 d87baa85d9
[autoparallel] support linear function bias addition (#2104) 2022-12-09 10:31:36 +08:00
YuliangLiu0306 0fecbb9e20
[autoparallel] support addbmm computation (#2102) 2022-12-08 21:15:11 +08:00
YuliangLiu0306 d3d4630495
[autoparallel] add sum handler (#2101) 2022-12-08 17:02:54 +08:00
Ziyue Jiang e4705ba4e2
[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087)
* add DAG test case

* fix datarace by adjusting theposition of lock

* polish code

* fix pytest for middleware

* remove test

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-08 13:32:27 +08:00
YuliangLiu0306 b175e6d58e
[autoparallel] add bias addtion function class (#2098)
* [autoparallel] add bias addtion function class

* polish code

* polish
2022-12-08 11:31:51 +08:00
YuliangLiu0306 3af7e65dea
[autoparallel] complete gpt related module search (#2097) 2022-12-08 10:04:09 +08:00
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099) 2022-12-07 23:04:02 +08:00
Jiarui Fang 978242326a
[Gemini] remove eval in gemini unittests! (#2092) 2022-12-07 11:58:37 +08:00
YuliangLiu0306 7f72eb0510
[autoparallel]add embedding handler (#2089)
* [autoparallel] add embedding handler

* fix bugs
2022-12-07 09:41:46 +08:00
Jiarui Fang 1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091) 2022-12-06 22:30:16 +08:00
Jiarui Fang 25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer (#2088) 2022-12-06 19:48:20 +08:00
Jiarui Fang 33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) 2022-12-06 16:43:06 +08:00
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083) 2022-12-06 12:53:58 +08:00
YuliangLiu0306 0e9db368ef
[autoparallel] add tensor constructor handler (#2082) 2022-12-06 10:20:10 +08:00
YuliangLiu0306 cdf537a648
[autoparallel] add non_split linear strategy (#2078)
* [autoparallel] add non_split linear stategy

* polish
2022-12-06 10:19:33 +08:00
Boyuan Yao cf0268da93
[autoparallel] Add F.conv metainfo (#2069)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print

* [autoparallel] add F.conv metainfo

* [autoparallel] linear fix
2022-12-06 10:17:57 +08:00
YuliangLiu0306 f123476666
[autoparallel] complete gpt block searching (#2065)
* [autoparallel] complete gpt block searching

* fix test
2022-12-06 10:17:10 +08:00
Ziyue Jiang 597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo (#2066)
* adapt scheduler for Topo

* remoove comment

* fix set input

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-05 20:23:41 +08:00
Jiarui Fang 4f21c9e8d9
[Gemini] polish runtime tracer tests (#2077) 2022-12-05 16:22:49 +08:00
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076) 2022-12-05 15:00:03 +08:00
Jiarui Fang 40b7d55bf3
[Gemini] add albert in test models. (#2075) 2022-12-05 14:09:34 +08:00
Jiarui Fang 616ed91ecd
[test] bert test in non-distributed way (#2074) 2022-12-05 13:32:16 +08:00
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) 2022-12-05 12:45:11 +08:00
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072) 2022-12-05 11:57:59 +08:00
Boyuan Yao 616da17fab
[autoparallel] add binary elementwise metainfo for auto parallel (#2058)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print
2022-12-04 15:18:51 +08:00
Ziyue Jiang 44ea461890
[Pipeline] Add Topo Class (#2059)
* use Topo class to rewrite DAG

* polish code

* polish code

* polish code

* add comment

* add else to unended if

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-02 18:13:20 +08:00
YuliangLiu0306 e4293e5077
[hotfix] update test for latest version (#2060) 2022-12-02 18:12:30 +08:00
YuliangLiu0306 19438ea0ef
[hotfix] skip gpt tracing test (#2064) 2022-12-02 16:48:28 +08:00
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052) 2022-12-02 16:04:19 +08:00
YuliangLiu0306 1c1fe44305
[autoparallel] adapt solver with self attention (#2037)
* [autoparallel] adapt solver with self attention

* polish code
2022-12-01 17:53:15 +08:00
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
* [gemini] fix init bugs for modules

* fix bugs
2022-11-30 17:06:10 +08:00
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) 2022-11-30 15:57:45 +08:00
Jiarui Fang 1e885329f4
[test] align model name with the file name. (#2045) 2022-11-30 15:45:26 +08:00
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043) 2022-11-30 14:53:41 +08:00
HELSON 384cd26314
[zero] fix testing parameters (#2042) 2022-11-30 12:09:32 +08:00
HELSON 17a3c685b0
[zero] fix unit-tests (#2039) 2022-11-30 10:40:31 +08:00
Jiarui Fang eb7742a4bb
[Gemini] more tests for Gemini (#2038)
* [Gemini] more tests for Gemini

* polish code
2022-11-29 17:13:10 +08:00
HELSON 537e181705
[testing] fix testing models (#2036)
* [testing] fix testing models

* roll back
2022-11-29 13:42:06 +08:00
HELSON a1ce02d740
[zero] test gradient accumulation (#1964)
* [zero] fix memory leak for zero2

* [zero] test gradient accumulation

* [zero] remove grad clip test
2022-11-29 13:00:30 +08:00
Ziyue Jiang b0936e4a44
[rpc] split with dag (#2028)
* add DAG to split_module

* add comment

* add test case for DAG

* remove print

* add DAG middleware in scheduler

* add test case for scheduler

* remove break

* recover old lifecycle

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-29 11:36:28 +08:00
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035) 2022-11-29 11:19:52 +08:00
YuliangLiu0306 0dbcd4a6f5
[autoparallel] add split handler (#2032)
* [autoparallel] add split handler

* add numerical test and runtime passes
2022-11-29 11:03:51 +08:00
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034) 2022-11-29 09:26:06 +08:00
YuliangLiu0306 81330b0352
[autoparallel] add experimental permute handler (#2029) 2022-11-27 20:26:52 +08:00
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030) 2022-11-26 13:30:24 +08:00
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) 2022-11-25 20:06:35 +08:00
Ziyue Jiang 632753abbc
[fx]Split partition with DAG information (#2025)
* add DAG to split_module

* add comment

* add test case for DAG

* remove print

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-25 17:42:48 +08:00
YuliangLiu0306 ea0f6b8df9
[autoparallel] add runtime pass and numerical test for view handler (#2018) 2022-11-25 15:50:16 +08:00
Jiarui Fang 2e9cbfca12
[Gemini] add unitests to check gemini correctness (#2015) 2022-11-24 16:51:45 +08:00
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016) 2022-11-24 15:28:58 +08:00
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009) 2022-11-24 14:40:33 +08:00
YuliangLiu0306 1438993113
[autoparallel] add experimental view handler (#2011)
* [autoparallel] add experimental view handler

* polish

* polish

* polish code

* rename variables
2022-11-24 11:34:41 +08:00
Genghan Zhang d655eea515
[autoparallel] mix gather (#1977)
* Add mix-gather

* Add comments

* Add comments

* Polish comments

* Change the global rank assumption

* Add tests

* Add two-step tests

* Fix 10 and 01

* Skip test becasue the number of GPUs
2022-11-23 21:49:17 +08:00
Jiarui Fang 3d907faede
[Gemini] add an inline_op_module to common test models and polish unitests. (#2004) 2022-11-23 16:55:54 +08:00
Boyuan Yao 6cd784ffee
[autoparallel] Add metainfo support for F.linear (#1987)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator
2022-11-23 14:12:34 +08:00
YuliangLiu0306 35e6b9ec82
[autoparallel] adapt handlers with attention block (#1990)
* [autoparallel] adapt handlers with attention block

* polish
2022-11-21 10:44:11 +08:00
Jiarui Fang 5bec3b2168
[Gemini] open grad checkpoint when model building (#1984) 2022-11-18 16:32:54 +08:00
Boyuan Yao c26f21d365
[autoparallel] add pooling metainfo (#1968)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo
2022-11-18 15:13:03 +08:00
Jiarui Fang 3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests (#1982) 2022-11-18 14:58:28 +08:00
Jiarui Fang e481489aa6
[Gemini] MemtracerWrapper unittests (#1981) 2022-11-18 14:19:40 +08:00
YuliangLiu0306 0da1d00399
[autoparallel] support distributed dataloader option (#1906)
* [autoparallel] support distributed dataloader option

* update output handler to support ddp dataloader

* poish code
2022-11-17 20:11:53 +08:00
Genghan Zhang 6630d45546
[autoparallel] Add alpha beta (#1973)
* Add alpha beta

* Fix test

* Fix test
2022-11-17 16:01:14 +08:00
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971) 2022-11-17 13:42:33 +08:00
Boyuan Yao 7c7921f71b
[autoparallel] add torch.nn.ReLU metainfo (#1868)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input
2022-11-16 23:12:31 +08:00
YuliangLiu0306 fea3cb661c
[autoparallel] support addmm in tracer and solver (#1961)
* [fx] patch addmm

* [autoparallel] support addmm in tracer and solver
2022-11-16 14:59:18 +08:00
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960) 2022-11-16 14:44:28 +08:00
HELSON 7066dfbf82
[zero] fix memory leak for zero2 (#1955) 2022-11-16 11:43:24 +08:00
Jiarui Fang 52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) 2022-11-15 16:24:16 +08:00
zbian 6877121377 updated flash attention api 2022-11-15 15:25:39 +08:00
Jiarui Fang 9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) 2022-11-14 16:05:09 +08:00
HELSON 6e51d296f0
[zero] migrate zero1&2 (#1878)
* add zero1&2 optimizer

* rename test ditectory

* rename test files

* change tolerance in test
2022-11-11 09:26:40 +08:00
Jiarui Fang 51597f6a28
[hotfix] pass test_complete_workflow (#1877) 2022-11-10 17:53:39 +08:00
Jiarui Fang 986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) 2022-11-10 17:36:42 +08:00
YuliangLiu0306 1b494ad73c
[autoparallel] fix linear logical convert issue (#1857) 2022-11-10 17:19:22 +08:00
Jiarui Fang c2947dadf1
[inference] streaming Linear 1D Row inference (#1874) 2022-11-10 17:03:21 +08:00
xcnick a141681260
[amp] add torch amp test (#1860) 2022-11-10 16:40:26 +08:00
Frank Lee e6ec99d389
[utils] fixed lazy init context (#1867) 2022-11-10 15:17:20 +08:00
Jiarui Fang 3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext (#1844) 2022-11-09 11:50:33 +08:00
YuliangLiu0306 f6032ddb17
[autoparallel] fix bias addition module (#1800) 2022-11-08 16:21:25 +08:00
ver217 99870726b1
[CheckpointIO] a uniform checkpoint I/O module (#1689) 2022-11-08 15:15:13 +08:00
Boyuan Yao 629172b319
[autoparallel] add batch norm metainfo (#1815)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler
2022-11-08 15:05:26 +08:00
Super Daniel 441d584e4a
[fx] add a symbolic_trace api. (#1812)
* [fx] add a symbolic_trace api.

* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
Jiarui Fang 6fa71d65d3
[fx] skip diffusers unitest if it is not installed (#1799) 2022-11-08 11:45:23 +08:00
oahzxl 9639ea88fc
[kernel] more flexible flashatt interface (#1804) 2022-11-07 17:02:09 +08:00
Boyuan Yao 327d07c44a
[autoparallel] add conv metainfo class for auto parallel (#1796)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test
2022-11-07 16:15:35 +08:00
oahzxl 501a9e9cd2
[hotfix] polish flash attention (#1802) 2022-11-07 14:30:22 +08:00
Jiarui Fang c248800359
[kernel] skip tests of flash_attn and triton when they are not available (#1798) 2022-11-07 13:41:13 +08:00
YuliangLiu0306 e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands (#1793) 2022-11-04 18:36:42 +08:00
Boyuan Yao 05ce3d369f
[fx] Add linear metainfo class for auto parallel (#1783)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
YuliangLiu0306 2c4c7b3618
[autoparallel] add getattr handler (#1767)
* [autoparallel] add getattr haandler

* polish code

* add extra processes for Parameters

* add unit test for param resharding cost

* add docstring and polish test
2022-11-03 12:31:33 +08:00
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
Jiarui Fang 32c1b843a9
skip torchrec unittests if not installed (#1790) 2022-11-02 14:44:32 +08:00
kurisusnowdeng 0b8161fab8 updated tp layers 2022-11-02 12:19:38 +08:00
YuliangLiu0306 e859380bf7
[fx] support module with bias addition (#1780)
* [autoparallel] refactor tracer to fix bias addition issue

* [fx] support module with bias addition

* create bias_addition_module

* refactor file structure

* polish code

* fix unit test
2022-11-01 22:53:51 +08:00
Frank Lee f3f19a5c47
[autoparallel] added matmul handler (#1763)
* [autoparallel] added matmul handler

* polish code
2022-11-01 15:14:53 +08:00
YuliangLiu0306 27de252334
[autoparallel] fix conv handler numerical test (#1771) 2022-11-01 10:43:44 +08:00
Super Daniel 1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764)
* [autoparallel] first move.

* [autoparallel] add solver rotor.

* [autoparallel] add ckpt solvers.

* [autoparallel] modify codegen.

* [fx] fix annotation in test.

* [fx] remove check.

* [autoparallel] polish docstring.

* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
YuliangLiu0306 a4d1f59c78
[autoparallel] add numerical test for handlers (#1769) 2022-10-28 10:59:59 +08:00
YuliangLiu0306 b0f7c8bde8
[autoparallel] update CommSpec to CommActions (#1768)
* [autoparallel] update CommSpec to CommActions

* polish code
2022-10-28 09:57:43 +08:00
YuliangLiu0306 b4cc59b61e
[autoparallel] add numerical test for node strategies (#1760)
* [autoparallel] add numerical test for node strategies

* polish code

* polish code
2022-10-27 10:42:54 +08:00
oahzxl 25952b67d7
[feat] add flash attention (#1762) 2022-10-26 16:15:52 +08:00
Super Daniel 0584654c79
[fx] refactor memory utils and extend shard utils. (#1754)
* [fx] change memory.py to memory_utils.py.

* [fx] add shard utils.

* [fx] fix import.

* [fx] check code style.

* [fx] add comment.

* [autoparallel] first move.

* [fx] add time computations.
2022-10-26 14:24:41 +08:00
YuliangLiu0306 314d8c497f
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757)
* [autoparallel] refactor the runtime apply pass and add doc string to passes

* fix unit test

* polish
2022-10-25 14:32:22 +08:00
Frank Lee f9a613d660
[autoparallel] added binary elementwise node handler (#1758)
* [autoparallel] added binary elementwise node handler

* polish code
2022-10-25 14:32:01 +08:00
YuliangLiu0306 d2fc067231
[autoparallel] fix param hook issue in transform pass (#1755) 2022-10-24 13:13:38 +08:00
Frank Lee 262652c8bc
[autoparallel] added addbmm handler (#1751) 2022-10-21 18:55:48 +08:00
YuliangLiu0306 980ed21723
[autoparallel] shard param and buffer as expected (#1753)
* [autoparallel] shard param and buffer as expected

* fix unit test issue
2022-10-21 15:45:13 +08:00
YuliangLiu0306 cdb7d5e7d2
[hotfix] autoparallel unit test (#1752) 2022-10-20 19:51:38 +08:00
YuliangLiu0306 a4ce180e85
[autoparallel] add sequential order to communication actions (#1735) 2022-10-20 18:48:18 +08:00
Super Daniel b893342f95
[fx] test tracer on diffuser modules. (#1750)
* [fx] test tracer on diffuser modules.

* [fx] shorter seq_len.

* Update requirements-test.txt
2022-10-20 18:25:05 +08:00
Frank Lee b80b6eaa88
[autoparallel] recovered skipped test cases (#1748) 2022-10-20 16:37:33 +08:00
Frank Lee 474111ecb5
[autoparallel] fixed wrong sharding strategy in conv handler (#1747)
* [autoparallel] fixed wrong sharding strategy in conv handler

* polish code
2022-10-20 16:12:39 +08:00
Frank Lee 8b8937d901
[autoparallel] fixed wrong generated strategy for dot op (#1746)
* [autoparallel] fixed wrong generated strategy for dot op

* polish code
2022-10-20 15:18:16 +08:00
Frank Lee 88a79814fb
[autoparallel] handled illegal strategy in node handler (#1743)
* [autoparallel] handled illegal strategy in node handler

* polish code
2022-10-19 17:08:52 +08:00
Super Daniel 30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730)
* [fx/profiler] add test.

* [fx] fix file names.

* [fx] add docstring and comment.

* [fx] polish profiler.py.

* [fx] fix import errors.

* [fx] fix profiler.

* [fx] fix names.
2022-10-19 14:24:51 +08:00
Frank Lee eee84908d4
[autoparallel] handled illegal sharding strategy (#1728)
* [autoparallel] handled illegal sharding strategy

* polish code
2022-10-19 12:53:06 +08:00
Ziheng Qin cbe9a4cb45 [NFC] polish tests/test_layers/test_3d/test_3d.py code style (#1740) 2022-10-19 12:20:51 +08:00
lucasliunju 912eb58ea0 [NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style (#1733) 2022-10-19 12:20:51 +08:00
Xue Fuzhao 754aa7c81f [NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style (#1731) 2022-10-19 12:20:51 +08:00
xyupeng ff373a11eb [NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style (#1723) 2022-10-19 12:20:51 +08:00
Kai Wang (Victor Kai) b38efe4e8a [NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718) 2022-10-19 12:20:51 +08:00
binmakeswell f6389d0813 [NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) 2022-10-19 12:20:51 +08:00
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
* add chunk manager init function

* fix unit tests

* add comment

* add flush=True
2022-10-18 16:31:22 +08:00
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
* [fx] move meta registration

* [fx] fix tests.

* [fx] fix test.

* [fx] fix.

* [meta] refactor meta registration.py.

* [fx] add compatibility descriptions.

* [fx] polish import.

* [fx] add a decorator.

* [fx] fix tests.

* [fx] remove print.

* [fx] edit raise error.

* [fx] edit raise error.

* [fx] add type hint.

* [fx] fix import in experimental.

* [rpc] remove color debug.

* [meta] fix naming.
2022-10-18 10:44:23 +08:00
Frank Lee e8d8eda5e7
[autoparallel] moved tests to test_tensor_shard (#1713) 2022-10-17 13:54:20 +08:00
YuliangLiu0306 845ff4a47a
[autoparallel] resnet block runtime apply (#1709)
* [autoparallel] resnet block runtime apply

* seperate buffer and parameter in MemoryCost

* polish code

* add comments and todos

* fix test issue
2022-10-17 13:37:38 +08:00
Frank Lee 22a115406b
[autoparallel] fixed broken node handler tests (#1708) 2022-10-14 18:25:59 +08:00
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
Frank Lee 6c331a5a09
[autoparallel] refactored the autoparallel module for organization (#1706)
* [autoparallel] refactored the autoparallel module for organization

* polish code
2022-10-14 13:27:00 +08:00
Frank Lee 91cd34e6e0
[unittest] added doc for the pytest wrapper (#1704) 2022-10-14 10:56:17 +08:00
YuliangLiu0306 451cd72dea
[autoparallel] adapt runtime passes (#1703)
* [autoparallel] adapt runtime passes v2

* polish code
2022-10-14 10:14:07 +08:00
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) 2022-10-13 22:22:27 +08:00
Frank Lee 0e52f3d3d5
[unittest] supported condititonal testing based on env var (#1701)
polish code
2022-10-13 19:38:45 +08:00
Frank Lee 8283e95db3
[autoparallel] collated all deprecated files (#1700)
* [autoparallel] collated all deprecated files

* polish code
2022-10-13 18:24:11 +08:00
YuliangLiu0306 81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler (#1695)
* [autoparallel] adapt solver and CostGraph with new handler

* fix test issue
2022-10-13 14:04:15 +08:00
YuliangLiu0306 42b882ef06
[autoparallel] add output handler and placeholder handler (#1694)
* [autoparallel] add output handler and placeholder handler

* Delete test_solver_with_resnet.py

* fix test bugs
2022-10-13 13:42:36 +08:00
YuliangLiu0306 56088e6d98
[autoparallel] add pooling handler (#1690)
* [autoparallel] add pooling handler

* polish code
2022-10-13 13:42:13 +08:00
YuliangLiu0306 319d654f79
[autoparallel] where_handler_v2 (#1688)
* where generator

* [autoparallel] where_handler_v2
2022-10-13 11:02:22 +08:00
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691) 2022-10-12 15:21:58 +08:00
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687) 2022-10-12 11:16:18 +08:00
YuliangLiu0306 af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) 2022-10-11 18:12:59 +08:00
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
* [fx/profiler] modify data_ptr into uuid for all tensors.

* [fx] modify uuid.

* [fx/profiler] tune performance on GPT-2.

* [fx] updates.

* [fx] debug.

* [fx] debug.

* [fx] cuda.
2022-10-11 11:03:35 +08:00
YuliangLiu0306 517b63939a
[autoparallel] add unary element wise handler v2 (#1674) 2022-10-09 17:30:42 +08:00
YuliangLiu0306 f6c6a932b8
[autoparallel] add following node generator (#1673)
* [autoparallel] add following node generator

* polish code

* polish code

* update name of arguments
2022-10-09 14:49:18 +08:00
YuliangLiu0306 52fda88796
[autoparallel] add layer norm handler v2 (#1671)
* [autoparallel] add layer norm handler v2

* polish code

* polish code
2022-10-09 14:23:22 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
* [autoparallel] add rotor c version

* [fx] remove metainfoprop in rotor solver

* [autoparallel] modify C
 code format

* [autoparallel] remove build.py

* [autoparallel] fix C extension build

* [autoparallel] add C solver consistency test

* [autoparallel] remove some unused imports

* [autoparallel] refactor rotor solver code

* [autoparallel] replace print with colossalai logger

* [autoparallel] ranks fixed
2022-10-03 17:13:30 +08:00
YuliangLiu0306 11ec070e53
[hotfix]unit test (#1670) 2022-09-29 12:49:28 +08:00
Frank Lee a60024e77a
[autoparallel] added utils for broadcast operation (#1665)
* [autoparallel] added utils for broadcast operation

* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667) 2022-09-29 11:20:59 +08:00
YuliangLiu0306 746f8f979d
[autoparallel] add batch norm handler v2 (#1666) 2022-09-29 11:02:49 +08:00
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
Frank Lee 3a4d6f63a8
[autoparallel] added node handler for bmm (#1655) 2022-09-28 11:32:16 +08:00
YuliangLiu0306 095854477f
[autoparallel] add conv handler v2 (#1663) 2022-09-28 11:24:59 +08:00
YuliangLiu0306 1e7816a460
[autoparallel] adapt solver with gpt (#1653) 2022-09-28 11:17:26 +08:00
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650) 2022-09-27 12:06:25 +08:00
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
* [autoparallel] change the following nodes strategies generation logic

* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
* [autoparallel] where handler

* fix unit test
2022-09-27 11:20:43 +08:00
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
* [fx] fix offload codegen test

* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639) 2022-09-26 16:58:14 +08:00
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
* [fix] fixed the collective pattern name for consistency

* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638) 2022-09-26 15:26:14 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2022-09-23 17:20:41 +08:00
HELSON a088022efc
[moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629) 2022-09-23 12:00:25 +08:00
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613) 2022-09-23 11:52:42 +08:00
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
2022-09-23 11:04:52 +08:00
YuliangLiu0306 9eae855408
[hotfix] add recompile after graph manipulatation (#1621) 2022-09-23 11:00:33 +08:00
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
2022-09-23 10:59:47 +08:00
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
* remove forced FP32 modules

* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Frank Lee d925122020
[autoparallel] added new linear module handler (#1616) 2022-09-21 12:23:21 +08:00
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
* [fx] PoC of runtime shape consistency application

* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306 47b11c432c
[autoparallel]add bcast matmul strategies (#1605) 2022-09-20 11:26:21 +08:00
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
YuliangLiu0306 eac1b79371
[autoparallel] add bcast op handler (#1600)
* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish
2022-09-16 11:33:01 +08:00
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00