Commit Graph

1320 Commits (31c78f2be3272a9a4062fe78eca34b3847a0c900)

Author SHA1 Message Date
YuliangLiu0306 0e9db368ef
[autoparallel] add tensor constructor handler (#2082)
2 years ago
YuliangLiu0306 cdf537a648
[autoparallel] add non_split linear strategy (#2078)
2 years ago
Boyuan Yao cf0268da93
[autoparallel] Add F.conv metainfo (#2069)
2 years ago
YuliangLiu0306 f123476666
[autoparallel] complete gpt block searching (#2065)
2 years ago
Ziyue Jiang 597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo (#2066)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
YuliangLiu0306 677e1e20d4
[device] update flatten device mesh usage (#2079)
2 years ago
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076)
2 years ago
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073)
2 years ago
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072)
2 years ago
Boyuan Yao 616da17fab
[autoparallel] add binary elementwise metainfo for auto parallel (#2058)
2 years ago
Boyuan Yao 4b40fbd743
[autoparallel] fix forward memory calculation (#2062)
2 years ago
Ziyue Jiang 44ea461890
[Pipeline] Add Topo Class (#2059)
2 years ago
YuliangLiu0306 e4293e5077
[hotfix] update test for latest version (#2060)
2 years ago
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052)
2 years ago
YuliangLiu0306 1c1fe44305
[autoparallel] adapt solver with self attention (#2037)
2 years ago
Frank Lee ea74a3b9cc
[cli] updated installation cheheck with more inforamtion (#2050)
2 years ago
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
2 years ago
Frank Lee 81e0da7fa8
[setup] supported conda-installed torch (#2048)
2 years ago
HELSON e37f3db40c
[gemini] add arguments (#2046)
2 years ago
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040)
2 years ago
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043)
2 years ago
HELSON a1ce02d740
[zero] test gradient accumulation (#1964)
2 years ago
Ziyue Jiang b0936e4a44
[rpc] split with dag (#2028)
2 years ago
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035)
2 years ago
YuliangLiu0306 0dbcd4a6f5
[autoparallel] add split handler (#2032)
2 years ago
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034)
2 years ago
YuliangLiu0306 81330b0352
[autoparallel] add experimental permute handler (#2029)
2 years ago
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030)
2 years ago
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003)
2 years ago
Ziyue Jiang 632753abbc
[fx]Split partition with DAG information (#2025)
2 years ago
YuliangLiu0306 ea0f6b8df9
[autoparallel] add runtime pass and numerical test for view handler (#2018)
2 years ago
Zihao a719b89a41
[gemini] param_trace_hook (#2020)
2 years ago
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016)
2 years ago
Zihao aba3db464d
[Gemini] ParamMemHook (#2008)
2 years ago
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009)
2 years ago
YuliangLiu0306 1438993113
[autoparallel] add experimental view handler (#2011)
2 years ago
Genghan Zhang d655eea515
[autoparallel] mix gather (#1977)
2 years ago
Frank Lee 2bab6f512c
[release] release v0.1.11rc4 (#2007)
2 years ago
Boyuan Yao 6cd784ffee
[autoparallel] Add metainfo support for F.linear (#1987)
2 years ago
Super Daniel 2edbef13cc
[fx] add more meta_registry for MetaTensor execution. (#2000)
2 years ago
Jiarui Fang a2d3266648
[hotfix] make Gemini work for conv DNN (#1998)
2 years ago
YuliangLiu0306 155891113e
[autoparallel] use pytree map style to process data (#1989)
2 years ago
YuliangLiu0306 35e6b9ec82
[autoparallel] adapt handlers with attention block (#1990)
2 years ago
YuliangLiu0306 05020e50d0
[autoparallel] support more flexible data type (#1967)
2 years ago
Boyuan Yao c26f21d365
[autoparallel] add pooling metainfo (#1968)
2 years ago
Jiarui Fang 3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests (#1982)
2 years ago
Jiarui Fang e481489aa6
[Gemini] MemtracerWrapper unittests (#1981)
2 years ago
Jiarui Fang 31922110ad
[Gemini] memory trace hook (#1978)
2 years ago
Jiarui Fang 0529fcde06
[Gemini] independent runtime tracer (#1974)
2 years ago
YuliangLiu0306 0da1d00399
[autoparallel] support distributed dataloader option (#1906)
2 years ago
Genghan Zhang 6630d45546
[autoparallel] Add alpha beta (#1973)
2 years ago
Jiarui Fang cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972)
2 years ago
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971)
2 years ago
Jiarui Fang 7e24b9b9ee
[Gemini] clean no used MemTraceOp (#1970)
2 years ago
Boyuan Yao 7c7921f71b
[autoparallel] add torch.nn.ReLU metainfo (#1868)
2 years ago
Jiarui Fang 8c66a1d0aa
[polish] remove useless file _mem_tracer_hook.py (#1963)
2 years ago
Jiarui Fang c4739a725a
[Gemini] polish memstats collector (#1962)
2 years ago
YuliangLiu0306 fea3cb661c
[autoparallel] support addmm in tracer and solver (#1961)
2 years ago
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960)
2 years ago
HELSON 7066dfbf82
[zero] fix memory leak for zero2 (#1955)
2 years ago
Jiarui Fang 52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953)
2 years ago
zbian 598d456d0e fixed logger
2 years ago
zbian 6877121377 updated flash attention api
2 years ago
YuliangLiu0306 36c0f3ea5b
[autoparallel] remove redundancy comm node (#1893)
2 years ago
アマデウス e52f9d9109
[tensorparallel] fixed tp layers (#1938)
2 years ago
Jiarui Fang 9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937)
2 years ago
Boyuan Yao d5c5bc219e
[SC] add GPT example for auto checkpoint (#1889)
2 years ago
Junming Wu 14a0b18305
[NFC] polish colossalai/amp/naive_amp/__init__.py code style (#1905)
2 years ago
HELSON 6e51d296f0
[zero] migrate zero1&2 (#1878)
2 years ago
Super Daniel cc55ff0aa4
[autoparallel] user-friendly API for CheckpointSolver. (#1879)
2 years ago
Super Daniel 448248b27c
[fx] metainfo_trace as an API. (#1873)
2 years ago
Jiarui Fang 986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876)
2 years ago
YuliangLiu0306 1b494ad73c
[autoparallel] fix linear logical convert issue (#1857)
2 years ago
Jiarui Fang c2947dadf1
[inference] streaming Linear 1D Row inference (#1874)
2 years ago
Frank Lee e6ec99d389
[utils] fixed lazy init context (#1867)
2 years ago
zbian 653b0a620e added skip_bias_add for non-tp linear
2 years ago
LuGY 94329fc139
[NFC] polish colossalai/amp/apex_amp/__init__.py code style (#1853)
2 years ago
zbian 1559a09fb7 [NFC] polish amp.naive_amp.grad_scaler code style
2 years ago
HELSON 72c9448920 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/operator_handler.py code style (#1845)
2 years ago
Genghan Zhang b25030cc07 [NFC] polish ./colossalai/amp/torch_amp/__init__.py code style (#1836)
2 years ago
Sze-qq 95ac4f88ea [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/conv_handler.py code style (#1829)
2 years ago
Ziyue Jiang 5da03c936d [NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823)
2 years ago
Fazzie-Maqianli 399f84d8f6 [NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819)
2 years ago
CsRic 9623ec1b02 [NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816)
2 years ago
binmakeswell 3c3714fc2a [NFC] polish strategies_constructor.py code style (#1806)
2 years ago
Jiarui Fang 3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext (#1844)
2 years ago
Jiarui Fang fba34efb5a
version to 0.1.11rc2 (#1832)
2 years ago
YuliangLiu0306 49216d7ab1
[autoparallel] fix bugs caused by negative dim key (#1808)
2 years ago
アマデウス 4268ae017b
[kernel] added jit warmup (#1792)
2 years ago
YuliangLiu0306 f6032ddb17
[autoparallel] fix bias addition module (#1800)
2 years ago
Jiarui Fang cd5a0d56fa
[Gemini] make gemini usage simple (#1821)
2 years ago
ver217 99870726b1
[CheckpointIO] a uniform checkpoint I/O module (#1689)
2 years ago
Boyuan Yao 629172b319
[autoparallel] add batch norm metainfo (#1815)
2 years ago
Super Daniel 441d584e4a
[fx] add a symbolic_trace api. (#1812)
2 years ago
xcnick e0da01ea71
[hotfix] fix build error when torch version >= 1.13 (#1803)
2 years ago
oahzxl 9639ea88fc
[kernel] more flexible flashatt interface (#1804)
2 years ago
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765)
2 years ago
Boyuan Yao 327d07c44a
[autoparallel] add conv metainfo class for auto parallel (#1796)
2 years ago
oahzxl 501a9e9cd2
[hotfix] polish flash attention (#1802)
2 years ago
Jiarui Fang 218c75fd9d
[NFC] polish type hint for shape consistency (#1801)
2 years ago
Jiarui Fang c248800359
[kernel] skip tests of flash_attn and triton when they are not available (#1798)
2 years ago
YuliangLiu0306 e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands (#1793)
2 years ago
Boyuan Yao 05ce3d369f
[fx] Add linear metainfo class for auto parallel (#1783)
2 years ago
Super Daniel e8a9bebc87
[autoparallel] refactor and add rotorc. (#1789)
2 years ago
YuliangLiu0306 2c4c7b3618
[autoparallel] add getattr handler (#1767)
2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
2 years ago
kurisusnowdeng 0b8161fab8 updated tp layers
2 years ago
Jiarui Fang cb5a587e9a
[hotfix] polish chunk import (#1787)
2 years ago
YuliangLiu0306 e859380bf7
[fx] support module with bias addition (#1780)
2 years ago
Frank Lee f3f19a5c47
[autoparallel] added matmul handler (#1763)
2 years ago
Ziyue Jiang 4df0194976
[Pipeline]Adapt to Pipelinable OPT (#1782)
2 years ago
YuliangLiu0306 27de252334
[autoparallel] fix conv handler numerical test (#1771)
2 years ago
Super Daniel 1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764)
2 years ago
Jiarui Fang f34dab4270
[compatibility] ChunkMgr import error (#1772)
2 years ago
YuliangLiu0306 b0f7c8bde8
[autoparallel] update CommSpec to CommActions (#1768)
2 years ago
YuliangLiu0306 b4cc59b61e
[autoparallel] add numerical test for node strategies (#1760)
2 years ago
oahzxl 25952b67d7
[feat] add flash attention (#1762)
2 years ago
Super Daniel 0584654c79
[fx] refactor memory utils and extend shard utils. (#1754)
2 years ago
Ziyue Jiang 63f250bbd4
fix file name (#1759)
2 years ago
YuliangLiu0306 314d8c497f
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757)
2 years ago
Frank Lee f9a613d660
[autoparallel] added binary elementwise node handler (#1758)
2 years ago
YuliangLiu0306 d2fc067231
[autoparallel] fix param hook issue in transform pass (#1755)
2 years ago
Frank Lee 262652c8bc
[autoparallel] added addbmm handler (#1751)
2 years ago
YuliangLiu0306 980ed21723
[autoparallel] shard param and buffer as expected (#1753)
2 years ago
YuliangLiu0306 cdb7d5e7d2
[hotfix] autoparallel unit test (#1752)
2 years ago
YuliangLiu0306 a4ce180e85
[autoparallel] add sequential order to communication actions (#1735)
2 years ago
Frank Lee 474111ecb5
[autoparallel] fixed wrong sharding strategy in conv handler (#1747)
2 years ago
Frank Lee 8b8937d901
[autoparallel] fixed wrong generated strategy for dot op (#1746)
2 years ago
Frank Lee 993b8875b6
[autoparallel] handled illegal sharding strategy in shape consistency (#1744)
2 years ago
Frank Lee 88a79814fb
[autoparallel] handled illegal strategy in node handler (#1743)
2 years ago
Super Daniel 30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730)
2 years ago
Frank Lee eee84908d4
[autoparallel] handled illegal sharding strategy (#1728)
2 years ago
Sze-qq 23703c9dd6 [NFC] polish colossalai/nn/metric/_utils.py code style (#1727)
2 years ago
Ofey Chan 7e62af28a0 [NFC] polish accuracy_2d.py code style (#1719)
2 years ago
LuGY 730f88f8e1 [NFC] polish _checkpoint_hook.py code style (#1722)
2 years ago
CsRic ea961d8fd1 [NFC] polish colossalai/zero/sharded_param/__init__.py code style (#1717)
2 years ago
yuxuan-lou 2b49ca80a3 [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716)
2 years ago
shenggan e1d780030d [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714)
2 years ago
YuliangLiu0306 d373e67b99
[hotfix] resharding cost issue (#1742)
2 years ago
Jiarui Fang 24e84eba60
upgrade version to 0.1.11rc1 (#1739)
2 years ago
Frank Lee d2e0e39c9d
[release] update to v0.1.11 (#1736)
2 years ago
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
2 years ago
YuliangLiu0306 51b89d2202
[autoparallel] runtime_backward_apply (#1720)
2 years ago
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
2 years ago
YuliangLiu0306 845ff4a47a
[autoparallel] resnet block runtime apply (#1709)
2 years ago
Frank Lee 22a115406b
[autoparallel] fixed broken node handler tests (#1708)
2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
2 years ago
binmakeswell 5f41463a76
add optimizer README for tutorials (#1707)
2 years ago
Frank Lee 6c331a5a09
[autoparallel] refactored the autoparallel module for organization (#1706)
2 years ago
Frank Lee 91cd34e6e0
[unittest] added doc for the pytest wrapper (#1704)
2 years ago
YuliangLiu0306 451cd72dea
[autoparallel] adapt runtime passes (#1703)
2 years ago
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699)
2 years ago
Frank Lee 0e52f3d3d5
[unittest] supported condititonal testing based on env var (#1701)
2 years ago
Frank Lee 8283e95db3
[autoparallel] collated all deprecated files (#1700)
2 years ago
Frank Lee e2355d01b9
[autoparallel] init new folder structure (#1696)
2 years ago
YuliangLiu0306 81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler (#1695)
2 years ago
YuliangLiu0306 42b882ef06
[autoparallel] add output handler and placeholder handler (#1694)
2 years ago
YuliangLiu0306 56088e6d98
[autoparallel] add pooling handler (#1690)
2 years ago
YuliangLiu0306 319d654f79
[autoparallel] where_handler_v2 (#1688)
2 years ago
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691)
2 years ago
Jiarui Fang 363fc2861a
[embeddings] more detailed timer (#1692)
2 years ago
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687)
2 years ago
YuliangLiu0306 af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug (#1683)
2 years ago
YuliangLiu0306 6878e42248
[hotfix] solver bug caused by dict type comm cost (#1686)
2 years ago
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
2 years ago
Kirigaya Kazuto 0df5034a36
[pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684)
2 years ago
jim e5ab6be72e
[hotfix[ fix colotensor.type() raise NotImplementedError (#1682)
2 years ago
Kirigaya Kazuto 3b2a59b0ba
[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681)
2 years ago
YuliangLiu0306 517b63939a
[autoparallel] add unary element wise handler v2 (#1674)
2 years ago
YuliangLiu0306 f6c6a932b8
[autoparallel] add following node generator (#1673)
2 years ago
YuliangLiu0306 52fda88796
[autoparallel] add layer norm handler v2 (#1671)
2 years ago
Fazzie-Maqianli 87c5ad352a
update version to 0.1.10 (#1676)
2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644)
2 years ago
Boyuan Yao b1be5b88bd
[autoparallel] fix insecure subprocess (#1680)
2 years ago
Boyuan Yao d8420f81a4
[hotfix] fix wrong type name in profiler (#1678)
2 years ago
Boyuan Yao 132b4306b7
[fx] Add concrete info prop (#1677)
2 years ago
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
2 years ago
YuliangLiu0306 11ec070e53
[hotfix]unit test (#1670)
2 years ago
Frank Lee a60024e77a
[autoparallel] added utils for broadcast operation (#1665)
2 years ago
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667)
2 years ago
Frank Lee 247a9dbca9
[autoparallel] added bias comm spec to matmul strategy (#1664)
2 years ago
YuliangLiu0306 746f8f979d
[autoparallel] add batch norm handler v2 (#1666)
2 years ago
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
2 years ago
YuliangLiu0306 c27e701cb2
[autoparallel] remove no strategy nodes (#1652)
2 years ago
Frank Lee 50f16a2850
[autoparallel] added compute resharding costs for node handler (#1662)
2 years ago
Frank Lee 9ec401a722
[autoparallel] added new strategy constructor template (#1661)
2 years ago
Frank Lee 3a4d6f63a8
[autoparallel] added node handler for bmm (#1655)
2 years ago
YuliangLiu0306 095854477f
[autoparallel] add conv handler v2 (#1663)
2 years ago
YuliangLiu0306 1e7816a460
[autoparallel] adapt solver with gpt (#1653)
2 years ago
Jiarui Fang c638bec028
[embedding] polish async copy (#1657)
2 years ago
Jiarui Fang 988570e4a6
[embedding] add more detail profiling (#1656)
2 years ago
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654)
2 years ago
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650)
2 years ago
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
2 years ago
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
2 years ago
Super Daniel 6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. (#1646)
2 years ago
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
2 years ago
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639)
2 years ago
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
2 years ago
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638)
2 years ago
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647)
2 years ago
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
Boyuan Yao f921733621
[autoparallel] Add pofo sequence annotation (#1637)
2 years ago
Super Daniel 04bbabeea8
[fx/profiler] provide a table of summary. (#1634)
2 years ago
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640)
2 years ago
Jiarui Fang e57df80325
[embeddings] cache option (#1635)
2 years ago
HELSON a088022efc
[moe] fix moe bugs (#1633)
2 years ago
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
2 years ago
YuliangLiu0306 c7ac0f4ab2
[autoparallel] add elementwise handler (#1622)
2 years ago
YuliangLiu0306 3a46215135
[autoparallel] add embedding handler (#1620)
2 years ago
YuliangLiu0306 69448f64c4
[autoparallel] protect bcast handler from invalid strategies (#1631)
2 years ago
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629)
2 years ago
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613)
2 years ago
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
2 years ago
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
2 years ago
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
2 years ago
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625)
2 years ago
Frank Lee d925122020
[autoparallel] added new linear module handler (#1616)
2 years ago
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
2 years ago
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611)
2 years ago
YuliangLiu0306 6a8f8cc05e
[hotfix] got sliced types (#1614)
2 years ago
Frank Lee d397842fa8
[autoparallel] added new node handler (#1612)
2 years ago
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
2 years ago
YuliangLiu0306 47b11c432c
[autoparallel]add bcast matmul strategies (#1605)
2 years ago
Frank Lee edb67cb378
[autoparallel] refactored the data structure for sharding strategy (#1610)
2 years ago
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
2 years ago
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
2 years ago
ver217 c9e8ce67b8
fix move fp32 shards (#1604)
2 years ago
YuliangLiu0306 eac1b79371
[autoparallel] add bcast op handler (#1600)
2 years ago
Frank Lee 3abf98a633
[autoparallel] added all non-bcast matmul strategies (#1603)
2 years ago
Frank Lee db98b695b2
[autoparallel] added strategy generator and bmm strategies (#1602)
2 years ago
Jiarui Fang a19eb80998
[embedding] updates some default parameters
2 years ago
Super Daniel cd5cf2bcc9
[fx/tuning] tune performance on rotor with meta info. (#1599)
2 years ago
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
2 years ago
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
2 years ago
YuliangLiu0306 faa23b9d9a
[autoparallel] add reshape handler (#1594)
2 years ago
Super Daniel 5c494d4540
[fx] provide an accurate estimation of memory. (#1587)
2 years ago
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591)
2 years ago
YuliangLiu0306 d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589)
2 years ago
Frank Lee 7c18a588c8
[autoparallel] added generate_sharding_spec to utils (#1590)
2 years ago
Boyuan Yao 49ccf8b5f8
[fx] Improve linearize and rotor solver (#1586)
2 years ago
Frank Lee 219f66c571
[autoparallel] added solver option dataclass (#1588)
2 years ago
YuliangLiu0306 82d4376c23
[autoparallel] adapt solver with resnet (#1583)
2 years ago
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584)
2 years ago
Boyuan Yao f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen (#1585)
2 years ago
Boyuan Yao 20e466527b [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576)
2 years ago
Fazzie-Maqianli 06dccdde44 [NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554)
2 years ago
CsRic 2ac46f7be4 [NFC] polish utils/tensor_detector/__init__.py code style (#1573)
2 years ago
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572)
2 years ago
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
2 years ago
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
2 years ago
Zirui Zhu f566c9b98d [NFC] polish colossalai/pipeline/utils.py code style (#1562)
2 years ago
Xue Fuzhao e070ca45c6 [NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563)
2 years ago
Zangwei Zheng 9823cbf24b [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565)
2 years ago
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568)
2 years ago
LuGY c7d4932956 [NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566)
2 years ago
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561)
2 years ago
Ziheng Qin 08815f0e72 [NFC] polish colossalai/builder/__init__.py code style (#1560)
2 years ago
Super Daniel 8328917348 [NFC] polish colossalai/testing/comparison.py code style. (#1558)
2 years ago
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556)
2 years ago
Kai Wang (Victor Kai) 46931e3c32 [NFC] polish code colossalai/gemini/update/search_utils.py (#1557)
2 years ago
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555)
2 years ago
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553)
2 years ago
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552)
2 years ago
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style
2 years ago
Kirigaya Kazuto 318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559)
2 years ago
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564)
2 years ago
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569)
2 years ago
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544)
2 years ago
Super Daniel 4f59693207
[fx] provide a stable but not accurate enough version of profiler. (#1547)
2 years ago
YuliangLiu0306 0908d0fc61
[autoparallel]add backward cost info into strategies (#1524)
2 years ago
YuliangLiu0306 1a3599410d
[autoparallel] support fucntion in operator handler (#1529)
2 years ago
YuliangLiu0306 44c866a3e3
[autoparallel] change the merge node logic (#1533)
2 years ago
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
2 years ago
ver217 2bed096848
[utils] optimize partition_tensor_parallel_state_dict (#1546)
2 years ago
Super Daniel d8a5aded19
[hotfix] change namespace for meta_trace. (#1541)
2 years ago
ver217 a203b709d5
[hotfix] fix init context (#1543)
2 years ago
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545)
2 years ago
Boyuan Yao 46c6cc79a9
[fx] Add common node in model linearize (#1542)
2 years ago
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537)
2 years ago
Super Daniel 70129603aa
[fx] support meta tracing for aten level computation graphs like functorch. (#1536)
2 years ago
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538)
2 years ago
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535)
2 years ago
Boyuan Yao 56159049e8
[fx] Modify solver linearize and add corresponding test (#1531)
2 years ago
YuliangLiu0306 4b3d6caeb3
[fx]patch nn.functional convolution (#1528)
2 years ago
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526)
2 years ago
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
2 years ago
Super Daniel 112a1f0a8f
[hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530)
2 years ago
Boyuan Yao b231430bcb
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521)
2 years ago
Super Daniel 5cc849f6ce
[fx] hack __torch_dispatch__ for meta tensor and autograd. (#1515)
2 years ago
Jiarui Fang 4537d39df9
[doc] docstring for FreqAwareEmbeddingBag (#1525)
2 years ago
YuliangLiu0306 3345c6d352
[autoparellel]add strategies constructor (#1505)
2 years ago
Frank Lee a0436a62ee
[autoparallel] added liveness analysis (#1516)
2 years ago
Jiarui Fang 9a9ef65313
[FAW] cpu caching operations (#1520)
2 years ago
Super Daniel ea1a95b8b9
[hotfix] fix coloproxy typos. (#1519)
2 years ago
Jiarui Fang af5438caa2
[FAW] refactor reorder() for CachedParamMgr (#1514)
2 years ago
Jiarui Fang 9feee6d06b
[FAW] LFU initialize with dataset freq (#1513)
2 years ago
CsRic 1b8fee8e9c
[FAW] shrink freq_cnter size (#1509)
2 years ago
Boyuan Yao 4acc58ee20
[fx] Fix activation codegen dealing with checkpointing first op (#1510)
2 years ago
Boyuan Yao ac3a453a50
[fx] fix the discretize bug (#1506)
2 years ago
Boyuan Yao 31fffd3fc5
[fx] fix wrong variable name in solver rotor (#1502)
2 years ago
Jiarui Fang ba61109b6c
[FAW] remove code related to chunk (#1501)
2 years ago
Jiarui Fang d5085bb317
[FAW] add more docs and fix a warning (#1500)
2 years ago
Kirigaya Kazuto 5a6fd71f90
[pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497)
2 years ago
CsRic 0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494)
2 years ago
YuliangLiu0306 8b7d6bd5be
[autoparallel] add more sharding strategies to conv (#1487)
2 years ago
Boyuan Yao de1e716dc4
[fx] Add activation checkpoint solver rotor (#1496)
2 years ago
Super Daniel 09c023bee2
[fx] add more op patches for profiler and error message for unsupported ops. (#1495)
2 years ago
YuliangLiu0306 413c053453
[autoparallel] add cost graph class (#1481)
2 years ago
YuliangLiu0306 4b03c25f85
[tensor]add 1D device mesh (#1492)
2 years ago
CsRic b8d0e39eaf
[FAW] LFU cache for the FAW
2 years ago
Kirigaya Kazuto 9145aef2b4
[pipeline/rpc] implement distributed optimizer | test with assert_close (#1486)
2 years ago
Frank Lee 3da68d6b1b
[fx] fixed adapative pooling size concatenation error (#1489)
2 years ago
Jiarui Fang cde7b8a5b8
[FAW] init an LFU implementation for FAW (#1488)
2 years ago
Super Daniel 32efe8e740
[fx] add profiler for fx nodes. (#1480)
2 years ago
Frank Lee d39e11dffb
[autoparallel] added namespace constraints (#1490)
2 years ago
Kirigaya Kazuto a6c8749198
[pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483)
2 years ago
Geng Zhang 0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462)
2 years ago
Frank Lee ede326298b
[autoparallel] integrate auto parallel with torch fx (#1479)
2 years ago
Boyuan Yao 1f2e547f7a
[fx] Fix ckpt functions' definitions in forward (#1476)
2 years ago
Kirigaya Kazuto bb5f5289e0
[pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470)
2 years ago
Frank Lee 628c7e3fc8
[autoparallel] added dot handler (#1475)
2 years ago
Frank Lee 9dae9bb2bc
[autoparallel] introduced baseclass for op handler and reduced code redundancy (#1471)
2 years ago
Frank Lee 3a54e1c9b7
[autoparallel] standardize the code structure (#1469)
2 years ago
YuliangLiu0306 26a37b5cd5
[autoparallel] Add conv handler to generate strategies and costs info for conv (#1467)
2 years ago
Jiarui Fang 1b491ad7de
[doc] update docstring in ProcessGroup (#1468)
2 years ago
YuliangLiu0306 b73fb7a077
[tensor] support runtime ShardingSpec apply (#1453)
2 years ago
Super Daniel bbc58d881b
[fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466)
2 years ago
Super Daniel e7383f578b
[fx] add rules to linearize computation graphs for searching. (#1461)
2 years ago
Boyuan Yao 092b9c8f49
[fx] Add use_reentrant=False to checkpoint in codegen (#1463)
2 years ago
Boyuan Yao 47fd8e4a02
[utils] Add use_reetrant=False in utils.activation_checkpoint (#1460)
2 years ago
Jiarui Fang 36824a304c
[Doc] add more doc for ColoTensor. (#1458)
2 years ago
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457)
2 years ago
Super Daniel 0dbd61c29b
[fx] fix test and algorithm bugs in activation checkpointing. (#1451)
2 years ago
Jiarui Fang b1553fdf96
[NFC] global vars should be upper case (#1456)
2 years ago
ver217 367c615818
fix nvme docstring (#1450)
2 years ago
Geng Zhang 9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448)
2 years ago
Frank Lee 5a52e21fe3
[test] fixed the activation codegen test (#1447)
2 years ago
YuliangLiu0306 0f3042363c
[tensor] shape consistency generate transform path and communication cost (#1435)
2 years ago
Boyuan Yao 5774fe0270
[fx] Use colossalai checkpoint and add offload recognition in codegen (#1439)
2 years ago
Kirigaya Kazuto e9460b45c8
[engin/schedule] use p2p_v2 to recontruct pipeline_schedule (#1408)
2 years ago
Frank Lee ae1b58cd16
[tensor] added linear implementation for the new sharding spec (#1416)
2 years ago
Super Daniel d40a9392ba
[fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446)
2 years ago
ver217 821c6172e2
[utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer (#1442)
2 years ago
HELSON b80340168e
[zero] add chunk_managerV2 for all-gather chunk (#1441)
2 years ago
Super Daniel 3b26516c69
[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433)
2 years ago
Jiarui Fang 30b4dd17c0
[FAW] export FAW in _ops (#1438)
2 years ago