アマデウス
077a66dd81
updated attention kernel ( #2133 )
2022-12-16 10:54:03 +08:00
YuliangLiu0306
536560ccc0
[autoparallel] implement softmax handler ( #2132 )
2022-12-14 16:09:53 +08:00
Jiarui Fang
c89c66a858
[Gemini] update API of the chunkmemstatscollector. ( #2129 )
2022-12-14 00:47:06 +08:00
Jiarui Fang
2938edf446
[Gemini] update the non model data record method in runtime memory tracer ( #2128 )
2022-12-13 17:11:31 +08:00
Jiarui Fang
deee317b0f
[Gemini] test step-tensor mapping using repeated_computed_layers.py ( #2127 )
2022-12-13 16:34:10 +08:00
Jiarui Fang
8fac837679
[Gemini] update non model data calculation method ( #2126 )
2022-12-13 15:44:07 +08:00
Jiarui Fang
5efda69735
[Gemini] hotfix the unittest bugs ( #2125 )
2022-12-13 14:14:55 +08:00
Jiarui Fang
05bb28aacf
[Gemini] mapping of preop timestep and param ( #2124 )
2022-12-13 12:50:24 +08:00
YuliangLiu0306
cd0af9f7f6
[autoparallel] gpt2lp runtimee test ( #2113 )
2022-12-12 18:06:40 +08:00
Jiarui Fang
9214d1fe28
[Gemini] chunk init using runtime visited param order ( #2115 )
2022-12-12 18:06:16 +08:00
HELSON
e7d3afc9cc
[optimizer] add div_scale for optimizers ( #2117 )
...
* [optimizer] add div_scale for optimizers
* [zero] use div_scale in zero optimizer
* fix testing error
2022-12-12 17:58:57 +08:00
Jiarui Fang
e5aa8333e4
[NFC] update chunk manager API ( #2119 )
2022-12-12 16:57:22 +08:00
Jiarui Fang
e99edfcb51
[NFC] polish comments for Chunk class ( #2116 )
2022-12-12 15:39:31 +08:00
Ziyue Jiang
09d69e1c25
[PP Middleware] Add bwd and step for PP middleware ( #2111 )
...
* add bwd and step for PP middleware
* pre-commit
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-12 12:40:03 +08:00
HELSON
63fbba3c19
[zero] add L2 gradient clipping for ZeRO ( #2112 )
...
* [zero] add L2 gradient clipping
* [testing] add MlpModel
* [zero] add unit test for grad clipping
* fix atol
2022-12-09 18:09:17 +08:00
Jiarui Fang
70a8556946
[gemini] get the param visited order during runtime ( #2108 )
2022-12-09 16:13:03 +08:00
YuliangLiu0306
d87baa85d9
[autoparallel] support linear function bias addition ( #2104 )
2022-12-09 10:31:36 +08:00
YuliangLiu0306
0fecbb9e20
[autoparallel] support addbmm computation ( #2102 )
2022-12-08 21:15:11 +08:00
YuliangLiu0306
d3d4630495
[autoparallel] add sum handler ( #2101 )
2022-12-08 17:02:54 +08:00
Ziyue Jiang
e4705ba4e2
[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG ( #2087 )
...
* add DAG test case
* fix datarace by adjusting theposition of lock
* polish code
* fix pytest for middleware
* remove test
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-08 13:32:27 +08:00
YuliangLiu0306
b175e6d58e
[autoparallel] add bias addtion function class ( #2098 )
...
* [autoparallel] add bias addtion function class
* polish code
* polish
2022-12-08 11:31:51 +08:00
YuliangLiu0306
3af7e65dea
[autoparallel] complete gpt related module search ( #2097 )
2022-12-08 10:04:09 +08:00
Jiarui Fang
85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) ( #2099 )
2022-12-07 23:04:02 +08:00
Jiarui Fang
978242326a
[Gemini] remove eval in gemini unittests! ( #2092 )
2022-12-07 11:58:37 +08:00
YuliangLiu0306
7f72eb0510
[autoparallel]add embedding handler ( #2089 )
...
* [autoparallel] add embedding handler
* fix bugs
2022-12-07 09:41:46 +08:00
Jiarui Fang
1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER ( #2091 )
2022-12-06 22:30:16 +08:00
Jiarui Fang
25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer ( #2088 )
2022-12-06 19:48:20 +08:00
Jiarui Fang
33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. ( #2084 )
2022-12-06 16:43:06 +08:00
Jiarui Fang
1f99205827
[Gemini] remove static tracer ( #2083 )
2022-12-06 12:53:58 +08:00
YuliangLiu0306
0e9db368ef
[autoparallel] add tensor constructor handler ( #2082 )
2022-12-06 10:20:10 +08:00
YuliangLiu0306
cdf537a648
[autoparallel] add non_split linear strategy ( #2078 )
...
* [autoparallel] add non_split linear stategy
* polish
2022-12-06 10:19:33 +08:00
Boyuan Yao
cf0268da93
[autoparallel] Add F.conv metainfo ( #2069 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
* [autoparallel] add batchnorm metainfo class
* [autoparallel] fix batchnorm unit test function declaration
* [fx] restore profiler
* [fx] add relu metainfo class
* [fx] restore profiler
* [autoparallel] modify metainfo input
* [autoparallel] add pooling metainfo
* [autoparallel] add F.linear metainfo generator
* [autoparallel] add binary elementwise metainfo
* [fx] recover profiler
* [autoparallel] fix forward memory calculation
* [autoparallel] modify constants.py
* [autoparallel] remove redundant print
* [autoparallel] add F.conv metainfo
* [autoparallel] linear fix
2022-12-06 10:17:57 +08:00
YuliangLiu0306
f123476666
[autoparallel] complete gpt block searching ( #2065 )
...
* [autoparallel] complete gpt block searching
* fix test
2022-12-06 10:17:10 +08:00
Ziyue Jiang
597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo ( #2066 )
...
* adapt scheduler for Topo
* remoove comment
* fix set input
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-05 20:23:41 +08:00
Jiarui Fang
4f21c9e8d9
[Gemini] polish runtime tracer tests ( #2077 )
2022-12-05 16:22:49 +08:00
Jiarui Fang
a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer ( #2076 )
2022-12-05 15:00:03 +08:00
Jiarui Fang
40b7d55bf3
[Gemini] add albert in test models. ( #2075 )
2022-12-05 14:09:34 +08:00
Jiarui Fang
616ed91ecd
[test] bert test in non-distributed way ( #2074 )
2022-12-05 13:32:16 +08:00
Jiarui Fang
223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer ( #2073 )
2022-12-05 12:45:11 +08:00
Jiarui Fang
9f828ef36f
[Gemini] remove not used MemtracerWrapper ( #2072 )
2022-12-05 11:57:59 +08:00
Boyuan Yao
616da17fab
[autoparallel] add binary elementwise metainfo for auto parallel ( #2058 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
* [autoparallel] add batchnorm metainfo class
* [autoparallel] fix batchnorm unit test function declaration
* [fx] restore profiler
* [fx] add relu metainfo class
* [fx] restore profiler
* [autoparallel] modify metainfo input
* [autoparallel] add pooling metainfo
* [autoparallel] add F.linear metainfo generator
* [autoparallel] add binary elementwise metainfo
* [fx] recover profiler
* [autoparallel] fix forward memory calculation
* [autoparallel] modify constants.py
* [autoparallel] remove redundant print
2022-12-04 15:18:51 +08:00
Ziyue Jiang
44ea461890
[Pipeline] Add Topo Class ( #2059 )
...
* use Topo class to rewrite DAG
* polish code
* polish code
* polish code
* add comment
* add else to unended if
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-02 18:13:20 +08:00
YuliangLiu0306
e4293e5077
[hotfix] update test for latest version ( #2060 )
2022-12-02 18:12:30 +08:00
YuliangLiu0306
19438ea0ef
[hotfix] skip gpt tracing test ( #2064 )
2022-12-02 16:48:28 +08:00
Zihao
38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue ( #2052 )
2022-12-02 16:04:19 +08:00
YuliangLiu0306
1c1fe44305
[autoparallel] adapt solver with self attention ( #2037 )
...
* [autoparallel] adapt solver with self attention
* polish code
2022-12-01 17:53:15 +08:00
HELSON
f6178728a0
[gemini] fix init bugs for modules ( #2047 )
...
* [gemini] fix init bugs for modules
* fix bugs
2022-11-30 17:06:10 +08:00
Zihao
6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook ( #2040 )
2022-11-30 15:57:45 +08:00
Jiarui Fang
1e885329f4
[test] align model name with the file name. ( #2045 )
2022-11-30 15:45:26 +08:00
Jiarui Fang
31c644027b
[hotfix] hotfix Gemini for no leaf modules bug ( #2043 )
2022-11-30 14:53:41 +08:00
HELSON
384cd26314
[zero] fix testing parameters ( #2042 )
2022-11-30 12:09:32 +08:00
HELSON
17a3c685b0
[zero] fix unit-tests ( #2039 )
2022-11-30 10:40:31 +08:00
Jiarui Fang
eb7742a4bb
[Gemini] more tests for Gemini ( #2038 )
...
* [Gemini] more tests for Gemini
* polish code
2022-11-29 17:13:10 +08:00
HELSON
537e181705
[testing] fix testing models ( #2036 )
...
* [testing] fix testing models
* roll back
2022-11-29 13:42:06 +08:00
HELSON
a1ce02d740
[zero] test gradient accumulation ( #1964 )
...
* [zero] fix memory leak for zero2
* [zero] test gradient accumulation
* [zero] remove grad clip test
2022-11-29 13:00:30 +08:00
Ziyue Jiang
b0936e4a44
[rpc] split with dag ( #2028 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
* add DAG middleware in scheduler
* add test case for scheduler
* remove break
* recover old lifecycle
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-29 11:36:28 +08:00
Jiarui Fang
96134e7be3
[hotfix] add bert test for gemini fwd bwd ( #2035 )
2022-11-29 11:19:52 +08:00
YuliangLiu0306
0dbcd4a6f5
[autoparallel] add split handler ( #2032 )
...
* [autoparallel] add split handler
* add numerical test and runtime passes
2022-11-29 11:03:51 +08:00
Jiarui Fang
28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd ( #2034 )
2022-11-29 09:26:06 +08:00
YuliangLiu0306
81330b0352
[autoparallel] add experimental permute handler ( #2029 )
2022-11-27 20:26:52 +08:00
Zihao
95c4532fff
[Gemini] paramWrapper paramTracerHook unitest ( #2030 )
2022-11-26 13:30:24 +08:00
Jiarui Fang
8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor ( #2003 )
2022-11-25 20:06:35 +08:00
Ziyue Jiang
632753abbc
[fx]Split partition with DAG information ( #2025 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-25 17:42:48 +08:00
YuliangLiu0306
ea0f6b8df9
[autoparallel] add runtime pass and numerical test for view handler ( #2018 )
2022-11-25 15:50:16 +08:00
Jiarui Fang
2e9cbfca12
[Gemini] add unitests to check gemini correctness ( #2015 )
2022-11-24 16:51:45 +08:00
Jiarui Fang
0b0d8f9e17
[hotfix] revert bug PRs ( #2016 )
2022-11-24 15:28:58 +08:00
Zihao
0160a62a3c
[Gemini] param_tracer_wrapper and test case ( #2009 )
2022-11-24 14:40:33 +08:00
YuliangLiu0306
1438993113
[autoparallel] add experimental view handler ( #2011 )
...
* [autoparallel] add experimental view handler
* polish
* polish
* polish code
* rename variables
2022-11-24 11:34:41 +08:00
Genghan Zhang
d655eea515
[autoparallel] mix gather ( #1977 )
...
* Add mix-gather
* Add comments
* Add comments
* Polish comments
* Change the global rank assumption
* Add tests
* Add two-step tests
* Fix 10 and 01
* Skip test becasue the number of GPUs
2022-11-23 21:49:17 +08:00
Jiarui Fang
3d907faede
[Gemini] add an inline_op_module to common test models and polish unitests. ( #2004 )
2022-11-23 16:55:54 +08:00
Boyuan Yao
6cd784ffee
[autoparallel] Add metainfo support for F.linear ( #1987 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
* [autoparallel] add batchnorm metainfo class
* [autoparallel] fix batchnorm unit test function declaration
* [fx] restore profiler
* [fx] add relu metainfo class
* [fx] restore profiler
* [autoparallel] modify metainfo input
* [autoparallel] add pooling metainfo
* [autoparallel] add F.linear metainfo generator
2022-11-23 14:12:34 +08:00
YuliangLiu0306
35e6b9ec82
[autoparallel] adapt handlers with attention block ( #1990 )
...
* [autoparallel] adapt handlers with attention block
* polish
2022-11-21 10:44:11 +08:00
Jiarui Fang
5bec3b2168
[Gemini] open grad checkpoint when model building ( #1984 )
2022-11-18 16:32:54 +08:00
Boyuan Yao
c26f21d365
[autoparallel] add pooling metainfo ( #1968 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
* [autoparallel] add batchnorm metainfo class
* [autoparallel] fix batchnorm unit test function declaration
* [fx] restore profiler
* [fx] add relu metainfo class
* [fx] restore profiler
* [autoparallel] modify metainfo input
* [autoparallel] add pooling metainfo
2022-11-18 15:13:03 +08:00
Jiarui Fang
3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests ( #1982 )
2022-11-18 14:58:28 +08:00
Jiarui Fang
e481489aa6
[Gemini] MemtracerWrapper unittests ( #1981 )
2022-11-18 14:19:40 +08:00
YuliangLiu0306
0da1d00399
[autoparallel] support distributed dataloader option ( #1906 )
...
* [autoparallel] support distributed dataloader option
* update output handler to support ddp dataloader
* poish code
2022-11-17 20:11:53 +08:00
Genghan Zhang
6630d45546
[autoparallel] Add alpha beta ( #1973 )
...
* Add alpha beta
* Fix test
* Fix test
2022-11-17 16:01:14 +08:00
ver217
f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` ( #1971 )
2022-11-17 13:42:33 +08:00
Boyuan Yao
7c7921f71b
[autoparallel] add torch.nn.ReLU metainfo ( #1868 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
* [autoparallel] add batchnorm metainfo class
* [autoparallel] fix batchnorm unit test function declaration
* [fx] restore profiler
* [fx] add relu metainfo class
* [fx] restore profiler
* [autoparallel] modify metainfo input
2022-11-16 23:12:31 +08:00
YuliangLiu0306
fea3cb661c
[autoparallel] support addmm in tracer and solver ( #1961 )
...
* [fx] patch addmm
* [autoparallel] support addmm in tracer and solver
2022-11-16 14:59:18 +08:00
Jiarui Fang
f7e276fa71
[Gemini] add GeminiAdamOptimizer ( #1960 )
2022-11-16 14:44:28 +08:00
HELSON
7066dfbf82
[zero] fix memory leak for zero2 ( #1955 )
2022-11-16 11:43:24 +08:00
Jiarui Fang
52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. ( #1953 )
2022-11-15 16:24:16 +08:00
zbian
6877121377
updated flash attention api
2022-11-15 15:25:39 +08:00
Jiarui Fang
9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. ( #1937 )
2022-11-14 16:05:09 +08:00
HELSON
6e51d296f0
[zero] migrate zero1&2 ( #1878 )
...
* add zero1&2 optimizer
* rename test ditectory
* rename test files
* change tolerance in test
2022-11-11 09:26:40 +08:00
Jiarui Fang
51597f6a28
[hotfix] pass test_complete_workflow ( #1877 )
2022-11-10 17:53:39 +08:00
Jiarui Fang
986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 ( #1876 )
2022-11-10 17:36:42 +08:00
YuliangLiu0306
1b494ad73c
[autoparallel] fix linear logical convert issue ( #1857 )
2022-11-10 17:19:22 +08:00
Jiarui Fang
c2947dadf1
[inference] streaming Linear 1D Row inference ( #1874 )
2022-11-10 17:03:21 +08:00
xcnick
a141681260
[amp] add torch amp test ( #1860 )
2022-11-10 16:40:26 +08:00
Frank Lee
e6ec99d389
[utils] fixed lazy init context ( #1867 )
2022-11-10 15:17:20 +08:00
Jiarui Fang
3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext ( #1844 )
2022-11-09 11:50:33 +08:00
YuliangLiu0306
f6032ddb17
[autoparallel] fix bias addition module ( #1800 )
2022-11-08 16:21:25 +08:00
ver217
99870726b1
[CheckpointIO] a uniform checkpoint I/O module ( #1689 )
2022-11-08 15:15:13 +08:00
Boyuan Yao
629172b319
[autoparallel] add batch norm metainfo ( #1815 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
* [autoparallel] add batchnorm metainfo class
* [autoparallel] fix batchnorm unit test function declaration
* [fx] restore profiler
2022-11-08 15:05:26 +08:00
Super Daniel
441d584e4a
[fx] add a symbolic_trace api. ( #1812 )
...
* [fx] add a symbolic_trace api.
* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
Jiarui Fang
6fa71d65d3
[fx] skip diffusers unitest if it is not installed ( #1799 )
2022-11-08 11:45:23 +08:00
oahzxl
9639ea88fc
[kernel] more flexible flashatt interface ( #1804 )
2022-11-07 17:02:09 +08:00
Boyuan Yao
327d07c44a
[autoparallel] add conv metainfo class for auto parallel ( #1796 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
* [fx] add conv metainfo class
* [fx] restore profiler
* [fx] restore meta profiler
* [autoparallel] modify unit test
* [fx] modify unit test
2022-11-07 16:15:35 +08:00
oahzxl
501a9e9cd2
[hotfix] polish flash attention ( #1802 )
2022-11-07 14:30:22 +08:00
Jiarui Fang
c248800359
[kernel] skip tests of flash_attn and triton when they are not available ( #1798 )
2022-11-07 13:41:13 +08:00
YuliangLiu0306
e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands ( #1793 )
2022-11-04 18:36:42 +08:00
Boyuan Yao
05ce3d369f
[fx] Add linear metainfo class for auto parallel ( #1783 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
YuliangLiu0306
2c4c7b3618
[autoparallel] add getattr handler ( #1767 )
...
* [autoparallel] add getattr haandler
* polish code
* add extra processes for Parameters
* add unit test for param resharding cost
* add docstring and polish test
2022-11-03 12:31:33 +08:00
HELSON
c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 ( #1786 )
...
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12
* [zero] add cpu shard init
* [zero] add tiny example test
* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
Jiarui Fang
32c1b843a9
skip torchrec unittests if not installed ( #1790 )
2022-11-02 14:44:32 +08:00
kurisusnowdeng
0b8161fab8
updated tp layers
2022-11-02 12:19:38 +08:00
YuliangLiu0306
e859380bf7
[fx] support module with bias addition ( #1780 )
...
* [autoparallel] refactor tracer to fix bias addition issue
* [fx] support module with bias addition
* create bias_addition_module
* refactor file structure
* polish code
* fix unit test
2022-11-01 22:53:51 +08:00
Frank Lee
f3f19a5c47
[autoparallel] added matmul handler ( #1763 )
...
* [autoparallel] added matmul handler
* polish code
2022-11-01 15:14:53 +08:00
YuliangLiu0306
27de252334
[autoparallel] fix conv handler numerical test ( #1771 )
2022-11-01 10:43:44 +08:00
Super Daniel
1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code ( #1764 )
...
* [autoparallel] first move.
* [autoparallel] add solver rotor.
* [autoparallel] add ckpt solvers.
* [autoparallel] modify codegen.
* [fx] fix annotation in test.
* [fx] remove check.
* [autoparallel] polish docstring.
* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
YuliangLiu0306
a4d1f59c78
[autoparallel] add numerical test for handlers ( #1769 )
2022-10-28 10:59:59 +08:00
YuliangLiu0306
b0f7c8bde8
[autoparallel] update CommSpec to CommActions ( #1768 )
...
* [autoparallel] update CommSpec to CommActions
* polish code
2022-10-28 09:57:43 +08:00
YuliangLiu0306
b4cc59b61e
[autoparallel] add numerical test for node strategies ( #1760 )
...
* [autoparallel] add numerical test for node strategies
* polish code
* polish code
2022-10-27 10:42:54 +08:00
oahzxl
25952b67d7
[feat] add flash attention ( #1762 )
2022-10-26 16:15:52 +08:00
Super Daniel
0584654c79
[fx] refactor memory utils and extend shard utils. ( #1754 )
...
* [fx] change memory.py to memory_utils.py.
* [fx] add shard utils.
* [fx] fix import.
* [fx] check code style.
* [fx] add comment.
* [autoparallel] first move.
* [fx] add time computations.
2022-10-26 14:24:41 +08:00
YuliangLiu0306
314d8c497f
[autoparallel] refactor the runtime apply pass and add docstring to passes ( #1757 )
...
* [autoparallel] refactor the runtime apply pass and add doc string to passes
* fix unit test
* polish
2022-10-25 14:32:22 +08:00
Frank Lee
f9a613d660
[autoparallel] added binary elementwise node handler ( #1758 )
...
* [autoparallel] added binary elementwise node handler
* polish code
2022-10-25 14:32:01 +08:00
YuliangLiu0306
d2fc067231
[autoparallel] fix param hook issue in transform pass ( #1755 )
2022-10-24 13:13:38 +08:00
Frank Lee
262652c8bc
[autoparallel] added addbmm handler ( #1751 )
2022-10-21 18:55:48 +08:00
YuliangLiu0306
980ed21723
[autoparallel] shard param and buffer as expected ( #1753 )
...
* [autoparallel] shard param and buffer as expected
* fix unit test issue
2022-10-21 15:45:13 +08:00
YuliangLiu0306
cdb7d5e7d2
[hotfix] autoparallel unit test ( #1752 )
2022-10-20 19:51:38 +08:00
YuliangLiu0306
a4ce180e85
[autoparallel] add sequential order to communication actions ( #1735 )
2022-10-20 18:48:18 +08:00
Super Daniel
b893342f95
[fx] test tracer on diffuser modules. ( #1750 )
...
* [fx] test tracer on diffuser modules.
* [fx] shorter seq_len.
* Update requirements-test.txt
2022-10-20 18:25:05 +08:00
Frank Lee
b80b6eaa88
[autoparallel] recovered skipped test cases ( #1748 )
2022-10-20 16:37:33 +08:00
Frank Lee
474111ecb5
[autoparallel] fixed wrong sharding strategy in conv handler ( #1747 )
...
* [autoparallel] fixed wrong sharding strategy in conv handler
* polish code
2022-10-20 16:12:39 +08:00
Frank Lee
8b8937d901
[autoparallel] fixed wrong generated strategy for dot op ( #1746 )
...
* [autoparallel] fixed wrong generated strategy for dot op
* polish code
2022-10-20 15:18:16 +08:00
Frank Lee
88a79814fb
[autoparallel] handled illegal strategy in node handler ( #1743 )
...
* [autoparallel] handled illegal strategy in node handler
* polish code
2022-10-19 17:08:52 +08:00
Super Daniel
30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler ( #1730 )
...
* [fx/profiler] add test.
* [fx] fix file names.
* [fx] add docstring and comment.
* [fx] polish profiler.py.
* [fx] fix import errors.
* [fx] fix profiler.
* [fx] fix names.
2022-10-19 14:24:51 +08:00
Frank Lee
eee84908d4
[autoparallel] handled illegal sharding strategy ( #1728 )
...
* [autoparallel] handled illegal sharding strategy
* polish code
2022-10-19 12:53:06 +08:00
Ziheng Qin
cbe9a4cb45
[NFC] polish tests/test_layers/test_3d/test_3d.py code style ( #1740 )
2022-10-19 12:20:51 +08:00
lucasliunju
912eb58ea0
[NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style ( #1733 )
2022-10-19 12:20:51 +08:00
Xue Fuzhao
754aa7c81f
[NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style ( #1731 )
2022-10-19 12:20:51 +08:00
xyupeng
ff373a11eb
[NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style ( #1723 )
2022-10-19 12:20:51 +08:00
Kai Wang (Victor Kai)
b38efe4e8a
[NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style ( #1718 )
2022-10-19 12:20:51 +08:00
binmakeswell
f6389d0813
[NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style ( #1715 )
2022-10-19 12:20:51 +08:00
HELSON
f69f9bf223
[zero] add chunk init function for users ( #1729 )
...
* add chunk manager init function
* fix unit tests
* add comment
* add flush=True
2022-10-18 16:31:22 +08:00
Super Daniel
393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug ( #1710 )
...
* [fx] move meta registration
* [fx] fix tests.
* [fx] fix test.
* [fx] fix.
* [meta] refactor meta registration.py.
* [fx] add compatibility descriptions.
* [fx] polish import.
* [fx] add a decorator.
* [fx] fix tests.
* [fx] remove print.
* [fx] edit raise error.
* [fx] edit raise error.
* [fx] add type hint.
* [fx] fix import in experimental.
* [rpc] remove color debug.
* [meta] fix naming.
2022-10-18 10:44:23 +08:00
Frank Lee
e8d8eda5e7
[autoparallel] moved tests to test_tensor_shard ( #1713 )
2022-10-17 13:54:20 +08:00
YuliangLiu0306
845ff4a47a
[autoparallel] resnet block runtime apply ( #1709 )
...
* [autoparallel] resnet block runtime apply
* seperate buffer and parameter in MemoryCost
* polish code
* add comments and todos
* fix test issue
2022-10-17 13:37:38 +08:00
Frank Lee
22a115406b
[autoparallel] fixed broken node handler tests ( #1708 )
2022-10-14 18:25:59 +08:00
HELSON
1468e4bcfc
[zero] add constant placement policy ( #1705 )
...
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
Frank Lee
6c331a5a09
[autoparallel] refactored the autoparallel module for organization ( #1706 )
...
* [autoparallel] refactored the autoparallel module for organization
* polish code
2022-10-14 13:27:00 +08:00
Frank Lee
91cd34e6e0
[unittest] added doc for the pytest wrapper ( #1704 )
2022-10-14 10:56:17 +08:00
YuliangLiu0306
451cd72dea
[autoparallel] adapt runtime passes ( #1703 )
...
* [autoparallel] adapt runtime passes v2
* polish code
2022-10-14 10:14:07 +08:00
Jiarui Fang
21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding ( #1699 )
2022-10-13 22:22:27 +08:00
Frank Lee
0e52f3d3d5
[unittest] supported condititonal testing based on env var ( #1701 )
...
polish code
2022-10-13 19:38:45 +08:00
Frank Lee
8283e95db3
[autoparallel] collated all deprecated files ( #1700 )
...
* [autoparallel] collated all deprecated files
* polish code
2022-10-13 18:24:11 +08:00
YuliangLiu0306
81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler ( #1695 )
...
* [autoparallel] adapt solver and CostGraph with new handler
* fix test issue
2022-10-13 14:04:15 +08:00
YuliangLiu0306
42b882ef06
[autoparallel] add output handler and placeholder handler ( #1694 )
...
* [autoparallel] add output handler and placeholder handler
* Delete test_solver_with_resnet.py
* fix test bugs
2022-10-13 13:42:36 +08:00
YuliangLiu0306
56088e6d98
[autoparallel] add pooling handler ( #1690 )
...
* [autoparallel] add pooling handler
* polish code
2022-10-13 13:42:13 +08:00
YuliangLiu0306
319d654f79
[autoparallel] where_handler_v2 ( #1688 )
...
* where generator
* [autoparallel] where_handler_v2
2022-10-13 11:02:22 +08:00
Boyuan Yao
31d2f03d27
[autoparallel] fix C version rotor inconsistency ( #1691 )
2022-10-12 15:21:58 +08:00
Frank Lee
4973157ad7
[autoparallel] added sharding spec conversion for linear handler ( #1687 )
2022-10-12 11:16:18 +08:00
YuliangLiu0306
af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug ( #1683 )
2022-10-11 18:12:59 +08:00
Super Daniel
3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 ( #1679 )
...
* [fx/profiler] modify data_ptr into uuid for all tensors.
* [fx] modify uuid.
* [fx/profiler] tune performance on GPT-2.
* [fx] updates.
* [fx] debug.
* [fx] debug.
* [fx] cuda.
2022-10-11 11:03:35 +08:00
YuliangLiu0306
517b63939a
[autoparallel] add unary element wise handler v2 ( #1674 )
2022-10-09 17:30:42 +08:00
YuliangLiu0306
f6c6a932b8
[autoparallel] add following node generator ( #1673 )
...
* [autoparallel] add following node generator
* polish code
* polish code
* update name of arguments
2022-10-09 14:49:18 +08:00
YuliangLiu0306
52fda88796
[autoparallel] add layer norm handler v2 ( #1671 )
...
* [autoparallel] add layer norm handler v2
* polish code
* polish code
2022-10-09 14:23:22 +08:00
HELSON
b28991dd0a
[feature] A new ZeRO implementation ( #1644 )
2022-10-09 09:18:51 +08:00
Boyuan Yao
1df98d5b66
[autoparallel] add rotor C version ( #1658 )
...
* [autoparallel] add rotor c version
* [fx] remove metainfoprop in rotor solver
* [autoparallel] modify C
code format
* [autoparallel] remove build.py
* [autoparallel] fix C extension build
* [autoparallel] add C solver consistency test
* [autoparallel] remove some unused imports
* [autoparallel] refactor rotor solver code
* [autoparallel] replace print with colossalai logger
* [autoparallel] ranks fixed
2022-10-03 17:13:30 +08:00
YuliangLiu0306
11ec070e53
[hotfix]unit test ( #1670 )
2022-09-29 12:49:28 +08:00
Frank Lee
a60024e77a
[autoparallel] added utils for broadcast operation ( #1665 )
...
* [autoparallel] added utils for broadcast operation
* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306
3f068d1409
[autoparallel] update CommSpec ( #1667 )
2022-09-29 11:20:59 +08:00
YuliangLiu0306
746f8f979d
[autoparallel] add batch norm handler v2 ( #1666 )
2022-09-29 11:02:49 +08:00
Kirigaya Kazuto
9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward ( #1642 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
* [pipeline/pytree] add pytree to process args and kwargs | provide to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
Frank Lee
3a4d6f63a8
[autoparallel] added node handler for bmm ( #1655 )
2022-09-28 11:32:16 +08:00
YuliangLiu0306
095854477f
[autoparallel] add conv handler v2 ( #1663 )
2022-09-28 11:24:59 +08:00
YuliangLiu0306
1e7816a460
[autoparallel] adapt solver with gpt ( #1653 )
2022-09-28 11:17:26 +08:00
Frank Lee
30e50c8b4a
[autoparallel] implemented all matmul strategy generator ( #1650 )
2022-09-27 12:06:25 +08:00
YuliangLiu0306
03978aad45
[autoparallel] change the following nodes strategies generation logic ( #1636 )
...
* [autoparallel] change the following nodes strategies generation logic
* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306
59f100510a
[autoparallel] where handler ( #1651 )
...
* [autoparallel] where handler
* fix unit test
2022-09-27 11:20:43 +08:00
Boyuan Yao
5d0fdb9cb4
[fx] fix offload codegen test ( #1648 )
...
* [fx] fix offload codegen test
* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee
45b39a692a
[autoparallel] implemented linear projection strategy generator ( #1639 )
2022-09-26 16:58:14 +08:00
Frank Lee
154d3ef432
[fix] fixed the collective pattern name for consistency ( #1649 )
...
* [fix] fixed the collective pattern name for consistency
* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306
b2b2a4af98
[autoparallel] adapt solver with mlp ( #1638 )
2022-09-26 15:26:14 +08:00
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2022-09-26 10:06:03 +08:00
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2022-09-24 19:58:18 +08:00
HELSON
95c35f73bd
[moe] initialize MoE groups by ProcessGroup ( #1640 )
2022-09-23 17:20:41 +08:00
HELSON
a088022efc
[moe] fix moe bugs ( #1633 )
2022-09-23 15:33:57 +08:00
YuliangLiu0306
702dbc5288
[tensor] use communication autograd func ( #1617 )
...
* [tensor] use communication autograd func
* change all to all comm spec info
* rename pattern and distinguish fwd/bwd
* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306
0c703189b9
[autoparallel] add layernorm handler ( #1629 )
2022-09-23 12:00:25 +08:00
YuliangLiu0306
bf77d3ab65
[autoparallel] recover the merged node strategy index ( #1613 )
2022-09-23 11:52:42 +08:00
Boyuan Yao
d6b01feb66
[fx] Modify offload codegen ( #1618 )
...
* [fx] modify offload codegen
* [fx] remove repeated hook definitions
* [fx] modify offload test
2022-09-23 11:04:52 +08:00
YuliangLiu0306
9eae855408
[hotfix] add recompile after graph manipulatation ( #1621 )
2022-09-23 11:00:33 +08:00
Super Daniel
d967779a32
[fx/profiler] tuned the calculation of memory estimation ( #1619 )
...
* [fx] tuned the meta info and rotor solver.
* [fx] remove import.
* [fx] remove import.
* [fx] remove import.
* [fx] tune the meta calculations.
* [fx] polish comments.
* [fx] remove assertions.
* [fx] modify test cases.
* [fx] modify test cases.
* [fx] optimize import.
* [fx
2022-09-23 10:59:47 +08:00
HELSON
f7f2248771
[moe] fix MoE bugs ( #1628 )
...
* remove forced FP32 modules
* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang
38c68b5b9a
[embedding] rollback for better FAW performance ( #1625 )
2022-09-22 11:16:25 +08:00
Frank Lee
d925122020
[autoparallel] added new linear module handler ( #1616 )
2022-09-21 12:23:21 +08:00
Kirigaya Kazuto
170fa81095
[pipeline/chimera] test chimera | fix bug of initializing ( #1615 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang
504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num ( #1611 )
2022-09-20 14:33:04 +08:00
YuliangLiu0306
7d1bb71d5d
[fx] PoC of runtime shape consistency application ( #1607 )
...
* [fx] PoC of runtime shape consistency application
* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306
47b11c432c
[autoparallel]add bcast matmul strategies ( #1605 )
2022-09-20 11:26:21 +08:00
Boyuan Yao
933b6c6367
[fx] Add pofo solver ( #1608 )
...
* [fx] add pofo algorithm
* [fx] Add pofo solver
* [fx] code refactor
* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto
edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera ( #1595 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
YuliangLiu0306
eac1b79371
[autoparallel] add bcast op handler ( #1600 )
...
* [autoparallel] add bcast op handler
* polish code
* add more BCAST FUNC OP
* polish code
* add exception handler
* polish
2022-09-16 11:33:01 +08:00
Boyuan Yao
a7cda6f57d
[fx] Add offload codegen ( #1598 )
...
* [fx] add input activation offload to codegen
* [fx] modify unit test
* [fx] remove two skips in torch11
* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel
c8e9b2ad78
[hotfix/rotor] fix variable names ( #1597 )
...
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
* [fx] fix variable names in ckpt_solvers and unskip tests.
* [fx] commit my changes.
* [fx] restore skips.
* [fx] restore skips.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00