Commit Graph

1084 Commits (92de90dfb3f22dbc8dd0bf5f7e03f138fe1b34e4)

Author SHA1 Message Date
Jiarui Fang 3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext (#1844) 2022-11-09 11:50:33 +08:00
Jiarui Fang fba34efb5a
version to 0.1.11rc2 (#1832) 2022-11-08 17:25:15 +08:00
YuliangLiu0306 49216d7ab1
[autoparallel] fix bugs caused by negative dim key (#1808)
* [autoparallel] fix bugs caused by negative dim key

* fix import error

* fix matmul test issue

* fix unit test issue
2022-11-08 17:03:50 +08:00
アマデウス 4268ae017b
[kernel] added jit warmup (#1792) 2022-11-08 16:22:23 +08:00
YuliangLiu0306 f6032ddb17
[autoparallel] fix bias addition module (#1800) 2022-11-08 16:21:25 +08:00
Jiarui Fang cd5a0d56fa
[Gemini] make gemini usage simple (#1821) 2022-11-08 15:53:13 +08:00
ver217 99870726b1
[CheckpointIO] a uniform checkpoint I/O module (#1689) 2022-11-08 15:15:13 +08:00
Boyuan Yao 629172b319
[autoparallel] add batch norm metainfo (#1815)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler
2022-11-08 15:05:26 +08:00
Super Daniel 441d584e4a
[fx] add a symbolic_trace api. (#1812)
* [fx] add a symbolic_trace api.

* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
xcnick e0da01ea71
[hotfix] fix build error when torch version >= 1.13 (#1803) 2022-11-08 09:40:24 +08:00
oahzxl 9639ea88fc
[kernel] more flexible flashatt interface (#1804) 2022-11-07 17:02:09 +08:00
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765) 2022-11-07 16:49:03 +08:00
Boyuan Yao 327d07c44a
[autoparallel] add conv metainfo class for auto parallel (#1796)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test
2022-11-07 16:15:35 +08:00
oahzxl 501a9e9cd2
[hotfix] polish flash attention (#1802) 2022-11-07 14:30:22 +08:00
Jiarui Fang 218c75fd9d
[NFC] polish type hint for shape consistency (#1801)
* [NFC] polish type hint for shape consistency

* polish code

* polish code
2022-11-07 14:13:03 +08:00
Jiarui Fang c248800359
[kernel] skip tests of flash_attn and triton when they are not available (#1798) 2022-11-07 13:41:13 +08:00
YuliangLiu0306 e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands (#1793) 2022-11-04 18:36:42 +08:00
Boyuan Yao 05ce3d369f
[fx] Add linear metainfo class for auto parallel (#1783)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
Super Daniel e8a9bebc87
[autoparallel] refactor and add rotorc. (#1789)
* [autoparallel] refactor and add rotorc.

* [autoparallel] refactor and add rotorc.
2022-11-03 12:32:51 +08:00
YuliangLiu0306 2c4c7b3618
[autoparallel] add getattr handler (#1767)
* [autoparallel] add getattr haandler

* polish code

* add extra processes for Parameters

* add unit test for param resharding cost

* add docstring and polish test
2022-11-03 12:31:33 +08:00
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
kurisusnowdeng 0b8161fab8 updated tp layers 2022-11-02 12:19:38 +08:00
Jiarui Fang cb5a587e9a
[hotfix] polish chunk import (#1787) 2022-11-02 12:10:52 +08:00
YuliangLiu0306 e859380bf7
[fx] support module with bias addition (#1780)
* [autoparallel] refactor tracer to fix bias addition issue

* [fx] support module with bias addition

* create bias_addition_module

* refactor file structure

* polish code

* fix unit test
2022-11-01 22:53:51 +08:00
Frank Lee f3f19a5c47
[autoparallel] added matmul handler (#1763)
* [autoparallel] added matmul handler

* polish code
2022-11-01 15:14:53 +08:00
Ziyue Jiang 4df0194976
[Pipeline]Adapt to Pipelinable OPT (#1782) 2022-11-01 14:18:50 +08:00
YuliangLiu0306 27de252334
[autoparallel] fix conv handler numerical test (#1771) 2022-11-01 10:43:44 +08:00
Super Daniel 1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764)
* [autoparallel] first move.

* [autoparallel] add solver rotor.

* [autoparallel] add ckpt solvers.

* [autoparallel] modify codegen.

* [fx] fix annotation in test.

* [fx] remove check.

* [autoparallel] polish docstring.

* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
Jiarui Fang f34dab4270
[compatibility] ChunkMgr import error (#1772) 2022-10-28 14:48:54 +08:00
YuliangLiu0306 b0f7c8bde8
[autoparallel] update CommSpec to CommActions (#1768)
* [autoparallel] update CommSpec to CommActions

* polish code
2022-10-28 09:57:43 +08:00
YuliangLiu0306 b4cc59b61e
[autoparallel] add numerical test for node strategies (#1760)
* [autoparallel] add numerical test for node strategies

* polish code

* polish code
2022-10-27 10:42:54 +08:00
oahzxl 25952b67d7
[feat] add flash attention (#1762) 2022-10-26 16:15:52 +08:00
Super Daniel 0584654c79
[fx] refactor memory utils and extend shard utils. (#1754)
* [fx] change memory.py to memory_utils.py.

* [fx] add shard utils.

* [fx] fix import.

* [fx] check code style.

* [fx] add comment.

* [autoparallel] first move.

* [fx] add time computations.
2022-10-26 14:24:41 +08:00
Ziyue Jiang 63f250bbd4
fix file name (#1759)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-10-25 16:48:48 +08:00
YuliangLiu0306 314d8c497f
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757)
* [autoparallel] refactor the runtime apply pass and add doc string to passes

* fix unit test

* polish
2022-10-25 14:32:22 +08:00
Frank Lee f9a613d660
[autoparallel] added binary elementwise node handler (#1758)
* [autoparallel] added binary elementwise node handler

* polish code
2022-10-25 14:32:01 +08:00
YuliangLiu0306 d2fc067231
[autoparallel] fix param hook issue in transform pass (#1755) 2022-10-24 13:13:38 +08:00
Frank Lee 262652c8bc
[autoparallel] added addbmm handler (#1751) 2022-10-21 18:55:48 +08:00
YuliangLiu0306 980ed21723
[autoparallel] shard param and buffer as expected (#1753)
* [autoparallel] shard param and buffer as expected

* fix unit test issue
2022-10-21 15:45:13 +08:00
YuliangLiu0306 cdb7d5e7d2
[hotfix] autoparallel unit test (#1752) 2022-10-20 19:51:38 +08:00
YuliangLiu0306 a4ce180e85
[autoparallel] add sequential order to communication actions (#1735) 2022-10-20 18:48:18 +08:00
Frank Lee 474111ecb5
[autoparallel] fixed wrong sharding strategy in conv handler (#1747)
* [autoparallel] fixed wrong sharding strategy in conv handler

* polish code
2022-10-20 16:12:39 +08:00
Frank Lee 8b8937d901
[autoparallel] fixed wrong generated strategy for dot op (#1746)
* [autoparallel] fixed wrong generated strategy for dot op

* polish code
2022-10-20 15:18:16 +08:00
Frank Lee 993b8875b6
[autoparallel] handled illegal sharding strategy in shape consistency (#1744)
* [autoparallel] handled illegal sharding strategy in shape consistency

* polish code
2022-10-20 12:06:25 +08:00
Frank Lee 88a79814fb
[autoparallel] handled illegal strategy in node handler (#1743)
* [autoparallel] handled illegal strategy in node handler

* polish code
2022-10-19 17:08:52 +08:00
Super Daniel 30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730)
* [fx/profiler] add test.

* [fx] fix file names.

* [fx] add docstring and comment.

* [fx] polish profiler.py.

* [fx] fix import errors.

* [fx] fix profiler.

* [fx] fix names.
2022-10-19 14:24:51 +08:00
Frank Lee eee84908d4
[autoparallel] handled illegal sharding strategy (#1728)
* [autoparallel] handled illegal sharding strategy

* polish code
2022-10-19 12:53:06 +08:00
Sze-qq 23703c9dd6 [NFC] polish colossalai/nn/metric/_utils.py code style (#1727) 2022-10-19 12:20:51 +08:00
Ofey Chan 7e62af28a0 [NFC] polish accuracy_2d.py code style (#1719) 2022-10-19 12:20:51 +08:00
LuGY 730f88f8e1 [NFC] polish _checkpoint_hook.py code style (#1722) 2022-10-19 12:20:51 +08:00
CsRic ea961d8fd1 [NFC] polish colossalai/zero/sharded_param/__init__.py code style (#1717)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-10-19 12:20:51 +08:00
yuxuan-lou 2b49ca80a3 [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) 2022-10-19 12:20:51 +08:00
shenggan e1d780030d [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) 2022-10-19 12:20:51 +08:00
YuliangLiu0306 d373e67b99
[hotfix] resharding cost issue (#1742) 2022-10-19 11:33:43 +08:00
Jiarui Fang 24e84eba60
upgrade version to 0.1.11rc1 (#1739) 2022-10-19 11:26:00 +08:00
Frank Lee d2e0e39c9d
[release] update to v0.1.11 (#1736) 2022-10-19 00:28:00 +08:00
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
* add chunk manager init function

* fix unit tests

* add comment

* add flush=True
2022-10-18 16:31:22 +08:00
YuliangLiu0306 51b89d2202
[autoparallel] runtime_backward_apply (#1720) 2022-10-18 10:44:58 +08:00
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
* [fx] move meta registration

* [fx] fix tests.

* [fx] fix test.

* [fx] fix.

* [meta] refactor meta registration.py.

* [fx] add compatibility descriptions.

* [fx] polish import.

* [fx] add a decorator.

* [fx] fix tests.

* [fx] remove print.

* [fx] edit raise error.

* [fx] edit raise error.

* [fx] add type hint.

* [fx] fix import in experimental.

* [rpc] remove color debug.

* [meta] fix naming.
2022-10-18 10:44:23 +08:00
YuliangLiu0306 845ff4a47a
[autoparallel] resnet block runtime apply (#1709)
* [autoparallel] resnet block runtime apply

* seperate buffer and parameter in MemoryCost

* polish code

* add comments and todos

* fix test issue
2022-10-17 13:37:38 +08:00
Frank Lee 22a115406b
[autoparallel] fixed broken node handler tests (#1708) 2022-10-14 18:25:59 +08:00
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
binmakeswell 5f41463a76
add optimizer README for tutorials (#1707) 2022-10-14 09:10:18 +00:00
Frank Lee 6c331a5a09
[autoparallel] refactored the autoparallel module for organization (#1706)
* [autoparallel] refactored the autoparallel module for organization

* polish code
2022-10-14 13:27:00 +08:00
Frank Lee 91cd34e6e0
[unittest] added doc for the pytest wrapper (#1704) 2022-10-14 10:56:17 +08:00
YuliangLiu0306 451cd72dea
[autoparallel] adapt runtime passes (#1703)
* [autoparallel] adapt runtime passes v2

* polish code
2022-10-14 10:14:07 +08:00
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) 2022-10-13 22:22:27 +08:00
Frank Lee 0e52f3d3d5
[unittest] supported condititonal testing based on env var (#1701)
polish code
2022-10-13 19:38:45 +08:00
Frank Lee 8283e95db3
[autoparallel] collated all deprecated files (#1700)
* [autoparallel] collated all deprecated files

* polish code
2022-10-13 18:24:11 +08:00
Frank Lee e2355d01b9
[autoparallel] init new folder structure (#1696) 2022-10-13 14:18:55 +08:00
YuliangLiu0306 81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler (#1695)
* [autoparallel] adapt solver and CostGraph with new handler

* fix test issue
2022-10-13 14:04:15 +08:00
YuliangLiu0306 42b882ef06
[autoparallel] add output handler and placeholder handler (#1694)
* [autoparallel] add output handler and placeholder handler

* Delete test_solver_with_resnet.py

* fix test bugs
2022-10-13 13:42:36 +08:00
YuliangLiu0306 56088e6d98
[autoparallel] add pooling handler (#1690)
* [autoparallel] add pooling handler

* polish code
2022-10-13 13:42:13 +08:00
YuliangLiu0306 319d654f79
[autoparallel] where_handler_v2 (#1688)
* where generator

* [autoparallel] where_handler_v2
2022-10-13 11:02:22 +08:00
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691) 2022-10-12 15:21:58 +08:00
Jiarui Fang 363fc2861a
[embeddings] more detailed timer (#1692) 2022-10-12 12:01:21 +08:00
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687) 2022-10-12 11:16:18 +08:00
YuliangLiu0306 af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) 2022-10-11 18:12:59 +08:00
YuliangLiu0306 6878e42248
[hotfix] solver bug caused by dict type comm cost (#1686) 2022-10-11 17:57:03 +08:00
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
* [fx/profiler] modify data_ptr into uuid for all tensors.

* [fx] modify uuid.

* [fx/profiler] tune performance on GPT-2.

* [fx] updates.

* [fx] debug.

* [fx] debug.

* [fx] cuda.
2022-10-11 11:03:35 +08:00
Kirigaya Kazuto 0df5034a36
[pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward

* [pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework
2022-10-10 16:01:02 +08:00
jim e5ab6be72e
[hotfix[ fix colotensor.type() raise NotImplementedError (#1682) 2022-10-10 10:13:31 +08:00
Kirigaya Kazuto 3b2a59b0ba
[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-10-09 17:32:57 +08:00
YuliangLiu0306 517b63939a
[autoparallel] add unary element wise handler v2 (#1674) 2022-10-09 17:30:42 +08:00
YuliangLiu0306 f6c6a932b8
[autoparallel] add following node generator (#1673)
* [autoparallel] add following node generator

* polish code

* polish code

* update name of arguments
2022-10-09 14:49:18 +08:00
YuliangLiu0306 52fda88796
[autoparallel] add layer norm handler v2 (#1671)
* [autoparallel] add layer norm handler v2

* polish code

* polish code
2022-10-09 14:23:22 +08:00
Fazzie-Maqianli 87c5ad352a
update version to 0.1.10 (#1676) 2022-10-09 10:43:29 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Boyuan Yao b1be5b88bd
[autoparallel] fix insecure subprocess (#1680)
* [autoparallel] fix insecure subprocess

* [fx] fix insecure subprocess
2022-10-06 15:07:03 +08:00
Boyuan Yao d8420f81a4
[hotfix] fix wrong type name in profiler (#1678) 2022-10-05 21:59:05 +08:00
Boyuan Yao 132b4306b7
[fx] Add concrete info prop (#1677)
* [fx] concreteinfoprop

* [fx] add concreteinfoprop

* [fx] modify docstring of ConcreteInfoProp

* [fx] fix device error

* [fx] modify parameter calculation

* [fx] modify parameters calculation
2022-10-04 16:48:24 +08:00
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
* [autoparallel] add rotor c version

* [fx] remove metainfoprop in rotor solver

* [autoparallel] modify C
 code format

* [autoparallel] remove build.py

* [autoparallel] fix C extension build

* [autoparallel] add C solver consistency test

* [autoparallel] remove some unused imports

* [autoparallel] refactor rotor solver code

* [autoparallel] replace print with colossalai logger

* [autoparallel] ranks fixed
2022-10-03 17:13:30 +08:00
YuliangLiu0306 11ec070e53
[hotfix]unit test (#1670) 2022-09-29 12:49:28 +08:00
Frank Lee a60024e77a
[autoparallel] added utils for broadcast operation (#1665)
* [autoparallel] added utils for broadcast operation

* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667) 2022-09-29 11:20:59 +08:00
Frank Lee 247a9dbca9
[autoparallel] added bias comm spec to matmul strategy (#1664) 2022-09-29 11:08:05 +08:00
YuliangLiu0306 746f8f979d
[autoparallel] add batch norm handler v2 (#1666) 2022-09-29 11:02:49 +08:00
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
YuliangLiu0306 c27e701cb2
[autoparallel] remove no strategy nodes (#1652)
* [autoparallel] remove no strategy nodes

* fix none object iteration issue
2022-09-29 10:43:25 +08:00
Frank Lee 50f16a2850
[autoparallel] added compute resharding costs for node handler (#1662) 2022-09-28 19:55:44 +08:00
Frank Lee 9ec401a722
[autoparallel] added new strategy constructor template (#1661)
* [autoparallel] added new strategy constructor template

* polish code
2022-09-28 14:01:36 +08:00
Frank Lee 3a4d6f63a8
[autoparallel] added node handler for bmm (#1655) 2022-09-28 11:32:16 +08:00
YuliangLiu0306 095854477f
[autoparallel] add conv handler v2 (#1663) 2022-09-28 11:24:59 +08:00
YuliangLiu0306 1e7816a460
[autoparallel] adapt solver with gpt (#1653) 2022-09-28 11:17:26 +08:00
Jiarui Fang c638bec028
[embedding] polish async copy (#1657) 2022-09-27 14:37:03 +08:00
Jiarui Fang 988570e4a6
[embedding] add more detail profiling (#1656) 2022-09-27 13:43:59 +08:00
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654) 2022-09-27 12:50:33 +08:00
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650) 2022-09-27 12:06:25 +08:00
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
* [autoparallel] change the following nodes strategies generation logic

* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
* [autoparallel] where handler

* fix unit test
2022-09-27 11:20:43 +08:00
Super Daniel 6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. (#1646)
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.

* [fx] optimize table repr.

* [fx] refactor code for profiler.

* [fx] add docstring.

* [fx] add docstring.

* [fx] skip test.

* [fx] redo.

* [fx] redo.

* [fx] fix import error for torch11.

* [fx] fix import error for torch11.
2022-09-27 10:26:52 +08:00
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
* [fx] fix offload codegen test

* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639) 2022-09-26 16:58:14 +08:00
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
* [fix] fixed the collective pattern name for consistency

* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638) 2022-09-26 15:26:14 +08:00
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647) 2022-09-26 14:57:57 +08:00
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
Boyuan Yao f921733621
[autoparallel] Add pofo sequence annotation (#1637)
* [autoparallel] annotate pofo sequence

* [autoparallel] remove unused print

* [autoparallel] fix some code
2022-09-24 01:52:57 +08:00
Super Daniel 04bbabeea8
[fx/profiler] provide a table of summary. (#1634)
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.
2022-09-23 18:12:43 +08:00
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2022-09-23 17:20:41 +08:00
Jiarui Fang e57df80325
[embeddings] cache option (#1635) 2022-09-23 16:40:18 +08:00
HELSON a088022efc
[moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306 c7ac0f4ab2
[autoparallel] add elementwise handler (#1622)
* [autoparallel] add elementwise handler

* polish code

* polish code

* reduce skipped strategies range

* polish code
2022-09-23 13:27:31 +08:00
YuliangLiu0306 3a46215135
[autoparallel] add embedding handler (#1620) 2022-09-23 12:34:30 +08:00
YuliangLiu0306 69448f64c4
[autoparallel] protect bcast handler from invalid strategies (#1631) 2022-09-23 12:12:49 +08:00
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629) 2022-09-23 12:00:25 +08:00
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613) 2022-09-23 11:52:42 +08:00
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
2022-09-23 11:04:52 +08:00
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
2022-09-23 10:59:47 +08:00
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
* remove forced FP32 modules

* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Frank Lee d925122020
[autoparallel] added new linear module handler (#1616) 2022-09-21 12:23:21 +08:00
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
YuliangLiu0306 6a8f8cc05e
[hotfix] got sliced types (#1614) 2022-09-20 14:32:42 +08:00
Frank Lee d397842fa8
[autoparallel] added new node handler (#1612) 2022-09-20 14:17:21 +08:00
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
* [fx] PoC of runtime shape consistency application

* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306 47b11c432c
[autoparallel]add bcast matmul strategies (#1605) 2022-09-20 11:26:21 +08:00
Frank Lee edb67cb378
[autoparallel] refactored the data structure for sharding strategy (#1610) 2022-09-20 11:20:54 +08:00
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
ver217 c9e8ce67b8
fix move fp32 shards (#1604) 2022-09-16 17:33:16 +08:00
YuliangLiu0306 eac1b79371
[autoparallel] add bcast op handler (#1600)
* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish
2022-09-16 11:33:01 +08:00
Frank Lee 3abf98a633
[autoparallel] added all non-bcast matmul strategies (#1603) 2022-09-16 10:47:32 +08:00
Frank Lee db98b695b2
[autoparallel] added strategy generator and bmm strategies (#1602) 2022-09-15 16:57:07 +08:00
Jiarui Fang a19eb80998
[embedding] updates some default parameters 2022-09-15 15:45:17 +08:00
Super Daniel cd5cf2bcc9
[fx/tuning] tune performance on rotor with meta info. (#1599) 2022-09-15 14:46:36 +08:00
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00
YuliangLiu0306 faa23b9d9a
[autoparallel] add reshape handler (#1594)
* [autoparallel] add reshape handler

* polish code
2022-09-14 10:25:45 +08:00
Super Daniel 5c494d4540
[fx] provide an accurate estimation of memory. (#1587)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.
2022-09-14 09:36:43 +08:00
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591)
* [autoparallel] refactored shape consistency to remove redundancy

* polish code

* polish code

* polish code
2022-09-13 18:30:18 +08:00
YuliangLiu0306 d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) 2022-09-13 18:05:05 +08:00
Frank Lee 7c18a588c8
[autoparallel] added generate_sharding_spec to utils (#1590) 2022-09-13 15:43:22 +08:00
Boyuan Yao 49ccf8b5f8
[fx] Improve linearize and rotor solver (#1586)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix

* [fx] imporve linearize and rotor solver

* [fx] some comments and format modification
2022-09-13 14:50:04 +08:00
Frank Lee 219f66c571
[autoparallel] added solver option dataclass (#1588) 2022-09-13 14:47:09 +08:00
YuliangLiu0306 82d4376c23
[autoparallel] adapt solver with resnet (#1583)
* [autoparallel]adapt solver with resnet

* polish code

* polish code
2022-09-13 12:07:09 +08:00
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2022-09-13 10:50:34 +08:00
Boyuan Yao f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen (#1585)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix
2022-09-12 20:00:48 +08:00
Boyuan Yao 20e466527b [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576) 2022-09-08 22:11:04 +08:00
Fazzie-Maqianli 06dccdde44 [NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554) 2022-09-08 22:11:04 +08:00
CsRic 2ac46f7be4 [NFC] polish utils/tensor_detector/__init__.py code style (#1573)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-08 22:11:04 +08:00
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) 2022-09-08 22:11:04 +08:00
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-09-08 22:11:04 +08:00
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
Zirui Zhu f566c9b98d [NFC] polish colossalai/pipeline/utils.py code style (#1562) 2022-09-08 22:11:04 +08:00
Xue Fuzhao e070ca45c6 [NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563) 2022-09-08 22:11:04 +08:00
Zangwei Zheng 9823cbf24b [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) 2022-09-08 22:11:04 +08:00
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) 2022-09-08 22:11:04 +08:00
LuGY c7d4932956 [NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566) 2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) 2022-09-08 22:11:04 +08:00
Ziheng Qin 08815f0e72 [NFC] polish colossalai/builder/__init__.py code style (#1560)
Co-authored-by: henryqin1997 <henryqin1997@gamil.com>
2022-09-08 22:11:04 +08:00
Super Daniel 8328917348 [NFC] polish colossalai/testing/comparison.py code style. (#1558) 2022-09-08 22:11:04 +08:00
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) 2022-09-08 22:11:04 +08:00
Kai Wang (Victor Kai) 46931e3c32 [NFC] polish code colossalai/gemini/update/search_utils.py (#1557) 2022-09-08 22:11:04 +08:00
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) 2022-09-08 22:11:04 +08:00
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) 2022-09-08 22:11:04 +08:00
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) 2022-09-08 22:11:04 +08:00
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style 2022-09-08 22:11:04 +08:00
Kirigaya Kazuto 318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559) 2022-09-08 22:04:34 +08:00
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564) 2022-09-08 16:41:19 +08:00
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569) 2022-09-08 16:40:56 +08:00
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544) 2022-09-07 19:01:06 +08:00
Super Daniel 4f59693207
[fx] provide a stable but not accurate enough version of profiler. (#1547)
* [fx] compute memory stat and flop count for MetaInfoProp.

* [fx] modify node attribute.

* [fx] modify ckpt_chen.

* [fx] fix compatibility.

* [fx] fix import error.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip if torch 1.11.0.

* [fx] recover MetaInfoProp support for PyTorch 1.11.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix import error.
2022-09-07 11:21:04 +08:00
YuliangLiu0306 0908d0fc61
[autoparallel]add backward cost info into strategies (#1524) 2022-09-07 11:19:00 +08:00
YuliangLiu0306 1a3599410d
[autoparallel] support fucntion in operator handler (#1529) 2022-09-07 11:18:41 +08:00
YuliangLiu0306 44c866a3e3
[autoparallel] change the merge node logic (#1533) 2022-09-07 11:18:19 +08:00
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
* refactor parallel layer

* broadcast rank0 model after load ckpt
2022-09-06 20:18:35 +08:00
ver217 2bed096848
[utils] optimize partition_tensor_parallel_state_dict (#1546) 2022-09-06 17:45:31 +08:00
Super Daniel d8a5aded19
[hotfix] change namespace for meta_trace. (#1541) 2022-09-06 11:46:12 +08:00
ver217 a203b709d5
[hotfix] fix init context (#1543)
* fix init context

* fix lazy init ctx
2022-09-06 11:45:08 +08:00
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545) 2022-09-06 10:41:20 +08:00
Boyuan Yao 46c6cc79a9
[fx] Add common node in model linearize (#1542)
* [fx] Add common node into linearize

* [fx] Add common node to solver
2022-09-05 18:35:05 +08:00
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537) 2022-09-05 15:12:53 +08:00
Super Daniel 70129603aa
[fx] support meta tracing for aten level computation graphs like functorch. (#1536)
* [fx] support meta tracing for aten level computation graphs like functorch.

* [fx] support meta tracing for aten level computation graphs like functorch.

* [fx] remove redundant import.

* [fx] add docstring.
2022-09-05 12:10:09 +08:00
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538) 2022-09-02 15:48:35 +08:00
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535) 2022-09-02 11:09:37 +08:00