Commit Graph

550 Commits (56a3dcdabdb6f85e7819cdbce4610f3b01226a9a)

Author SHA1 Message Date
Boyuan Yao 6cd784ffee
[autoparallel] Add metainfo support for F.linear (#1987)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator
2022-11-23 14:12:34 +08:00
YuliangLiu0306 35e6b9ec82
[autoparallel] adapt handlers with attention block (#1990)
* [autoparallel] adapt handlers with attention block

* polish
2022-11-21 10:44:11 +08:00
Jiarui Fang 5bec3b2168
[Gemini] open grad checkpoint when model building (#1984) 2022-11-18 16:32:54 +08:00
Boyuan Yao c26f21d365
[autoparallel] add pooling metainfo (#1968)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo
2022-11-18 15:13:03 +08:00
Jiarui Fang 3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests (#1982) 2022-11-18 14:58:28 +08:00
Jiarui Fang e481489aa6
[Gemini] MemtracerWrapper unittests (#1981) 2022-11-18 14:19:40 +08:00
YuliangLiu0306 0da1d00399
[autoparallel] support distributed dataloader option (#1906)
* [autoparallel] support distributed dataloader option

* update output handler to support ddp dataloader

* poish code
2022-11-17 20:11:53 +08:00
Genghan Zhang 6630d45546
[autoparallel] Add alpha beta (#1973)
* Add alpha beta

* Fix test

* Fix test
2022-11-17 16:01:14 +08:00
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971) 2022-11-17 13:42:33 +08:00
Boyuan Yao 7c7921f71b
[autoparallel] add torch.nn.ReLU metainfo (#1868)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input
2022-11-16 23:12:31 +08:00
YuliangLiu0306 fea3cb661c
[autoparallel] support addmm in tracer and solver (#1961)
* [fx] patch addmm

* [autoparallel] support addmm in tracer and solver
2022-11-16 14:59:18 +08:00
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960) 2022-11-16 14:44:28 +08:00
HELSON 7066dfbf82
[zero] fix memory leak for zero2 (#1955) 2022-11-16 11:43:24 +08:00
Jiarui Fang 52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) 2022-11-15 16:24:16 +08:00
zbian 6877121377 updated flash attention api 2022-11-15 15:25:39 +08:00
Jiarui Fang 9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) 2022-11-14 16:05:09 +08:00
HELSON 6e51d296f0
[zero] migrate zero1&2 (#1878)
* add zero1&2 optimizer

* rename test ditectory

* rename test files

* change tolerance in test
2022-11-11 09:26:40 +08:00
Jiarui Fang 51597f6a28
[hotfix] pass test_complete_workflow (#1877) 2022-11-10 17:53:39 +08:00
Jiarui Fang 986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) 2022-11-10 17:36:42 +08:00
YuliangLiu0306 1b494ad73c
[autoparallel] fix linear logical convert issue (#1857) 2022-11-10 17:19:22 +08:00
Jiarui Fang c2947dadf1
[inference] streaming Linear 1D Row inference (#1874) 2022-11-10 17:03:21 +08:00
xcnick a141681260
[amp] add torch amp test (#1860) 2022-11-10 16:40:26 +08:00
Frank Lee e6ec99d389
[utils] fixed lazy init context (#1867) 2022-11-10 15:17:20 +08:00
Jiarui Fang 3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext (#1844) 2022-11-09 11:50:33 +08:00
YuliangLiu0306 f6032ddb17
[autoparallel] fix bias addition module (#1800) 2022-11-08 16:21:25 +08:00
ver217 99870726b1
[CheckpointIO] a uniform checkpoint I/O module (#1689) 2022-11-08 15:15:13 +08:00
Boyuan Yao 629172b319
[autoparallel] add batch norm metainfo (#1815)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler
2022-11-08 15:05:26 +08:00
Super Daniel 441d584e4a
[fx] add a symbolic_trace api. (#1812)
* [fx] add a symbolic_trace api.

* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
Jiarui Fang 6fa71d65d3
[fx] skip diffusers unitest if it is not installed (#1799) 2022-11-08 11:45:23 +08:00
oahzxl 9639ea88fc
[kernel] more flexible flashatt interface (#1804) 2022-11-07 17:02:09 +08:00
Boyuan Yao 327d07c44a
[autoparallel] add conv metainfo class for auto parallel (#1796)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test
2022-11-07 16:15:35 +08:00
oahzxl 501a9e9cd2
[hotfix] polish flash attention (#1802) 2022-11-07 14:30:22 +08:00
Jiarui Fang c248800359
[kernel] skip tests of flash_attn and triton when they are not available (#1798) 2022-11-07 13:41:13 +08:00
YuliangLiu0306 e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands (#1793) 2022-11-04 18:36:42 +08:00
Boyuan Yao 05ce3d369f
[fx] Add linear metainfo class for auto parallel (#1783)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
YuliangLiu0306 2c4c7b3618
[autoparallel] add getattr handler (#1767)
* [autoparallel] add getattr haandler

* polish code

* add extra processes for Parameters

* add unit test for param resharding cost

* add docstring and polish test
2022-11-03 12:31:33 +08:00
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
Jiarui Fang 32c1b843a9
skip torchrec unittests if not installed (#1790) 2022-11-02 14:44:32 +08:00
kurisusnowdeng 0b8161fab8 updated tp layers 2022-11-02 12:19:38 +08:00
YuliangLiu0306 e859380bf7
[fx] support module with bias addition (#1780)
* [autoparallel] refactor tracer to fix bias addition issue

* [fx] support module with bias addition

* create bias_addition_module

* refactor file structure

* polish code

* fix unit test
2022-11-01 22:53:51 +08:00
Frank Lee f3f19a5c47
[autoparallel] added matmul handler (#1763)
* [autoparallel] added matmul handler

* polish code
2022-11-01 15:14:53 +08:00
YuliangLiu0306 27de252334
[autoparallel] fix conv handler numerical test (#1771) 2022-11-01 10:43:44 +08:00
Super Daniel 1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764)
* [autoparallel] first move.

* [autoparallel] add solver rotor.

* [autoparallel] add ckpt solvers.

* [autoparallel] modify codegen.

* [fx] fix annotation in test.

* [fx] remove check.

* [autoparallel] polish docstring.

* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
YuliangLiu0306 a4d1f59c78
[autoparallel] add numerical test for handlers (#1769) 2022-10-28 10:59:59 +08:00
YuliangLiu0306 b0f7c8bde8
[autoparallel] update CommSpec to CommActions (#1768)
* [autoparallel] update CommSpec to CommActions

* polish code
2022-10-28 09:57:43 +08:00
YuliangLiu0306 b4cc59b61e
[autoparallel] add numerical test for node strategies (#1760)
* [autoparallel] add numerical test for node strategies

* polish code

* polish code
2022-10-27 10:42:54 +08:00
oahzxl 25952b67d7
[feat] add flash attention (#1762) 2022-10-26 16:15:52 +08:00
Super Daniel 0584654c79
[fx] refactor memory utils and extend shard utils. (#1754)
* [fx] change memory.py to memory_utils.py.

* [fx] add shard utils.

* [fx] fix import.

* [fx] check code style.

* [fx] add comment.

* [autoparallel] first move.

* [fx] add time computations.
2022-10-26 14:24:41 +08:00
YuliangLiu0306 314d8c497f
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757)
* [autoparallel] refactor the runtime apply pass and add doc string to passes

* fix unit test

* polish
2022-10-25 14:32:22 +08:00
Frank Lee f9a613d660
[autoparallel] added binary elementwise node handler (#1758)
* [autoparallel] added binary elementwise node handler

* polish code
2022-10-25 14:32:01 +08:00