Commit Graph

152 Commits (1bed38ef37c6edd4ddc9c935218382305c5f9438)

Author SHA1 Message Date
Frank Lee 993b8875b6
[autoparallel] handled illegal sharding strategy in shape consistency (#1744)
2 years ago
Super Daniel 30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730)
2 years ago
YuliangLiu0306 51b89d2202
[autoparallel] runtime_backward_apply (#1720)
2 years ago
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
2 years ago
YuliangLiu0306 845ff4a47a
[autoparallel] resnet block runtime apply (#1709)
2 years ago
YuliangLiu0306 451cd72dea
[autoparallel] adapt runtime passes (#1703)
2 years ago
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691)
2 years ago
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
2 years ago
Boyuan Yao b1be5b88bd
[autoparallel] fix insecure subprocess (#1680)
2 years ago
Boyuan Yao d8420f81a4
[hotfix] fix wrong type name in profiler (#1678)
2 years ago
Boyuan Yao 132b4306b7
[fx] Add concrete info prop (#1677)
2 years ago
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
2 years ago
Super Daniel 6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. (#1646)
2 years ago
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
2 years ago
Boyuan Yao f921733621
[autoparallel] Add pofo sequence annotation (#1637)
2 years ago
Super Daniel 04bbabeea8
[fx/profiler] provide a table of summary. (#1634)
2 years ago
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
2 years ago
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
2 years ago
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
2 years ago
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
2 years ago
Super Daniel cd5cf2bcc9
[fx/tuning] tune performance on rotor with meta info. (#1599)
2 years ago
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
2 years ago
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
2 years ago
Super Daniel 5c494d4540
[fx] provide an accurate estimation of memory. (#1587)
2 years ago
Boyuan Yao 49ccf8b5f8
[fx] Improve linearize and rotor solver (#1586)
2 years ago
Boyuan Yao f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen (#1585)
2 years ago
Xue Fuzhao e070ca45c6 [NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563)
2 years ago
Super Daniel 4f59693207
[fx] provide a stable but not accurate enough version of profiler. (#1547)
2 years ago
Super Daniel d8a5aded19
[hotfix] change namespace for meta_trace. (#1541)
2 years ago
Boyuan Yao 46c6cc79a9
[fx] Add common node in model linearize (#1542)
2 years ago
Super Daniel 70129603aa
[fx] support meta tracing for aten level computation graphs like functorch. (#1536)
2 years ago
Boyuan Yao 56159049e8
[fx] Modify solver linearize and add corresponding test (#1531)
2 years ago
YuliangLiu0306 4b3d6caeb3
[fx]patch nn.functional convolution (#1528)
2 years ago
Super Daniel 112a1f0a8f
[hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530)
2 years ago
Boyuan Yao b231430bcb
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521)
2 years ago
Super Daniel 5cc849f6ce
[fx] hack __torch_dispatch__ for meta tensor and autograd. (#1515)
2 years ago
Frank Lee a0436a62ee
[autoparallel] added liveness analysis (#1516)
2 years ago
Super Daniel ea1a95b8b9
[hotfix] fix coloproxy typos. (#1519)
2 years ago
Boyuan Yao 4acc58ee20
[fx] Fix activation codegen dealing with checkpointing first op (#1510)
2 years ago
Boyuan Yao ac3a453a50
[fx] fix the discretize bug (#1506)
2 years ago
Boyuan Yao 31fffd3fc5
[fx] fix wrong variable name in solver rotor (#1502)
2 years ago
Boyuan Yao de1e716dc4
[fx] Add activation checkpoint solver rotor (#1496)
2 years ago
Super Daniel 09c023bee2
[fx] add more op patches for profiler and error message for unsupported ops. (#1495)
2 years ago
YuliangLiu0306 413c053453
[autoparallel] add cost graph class (#1481)
2 years ago
Frank Lee 3da68d6b1b
[fx] fixed adapative pooling size concatenation error (#1489)
2 years ago
Super Daniel 32efe8e740
[fx] add profiler for fx nodes. (#1480)
2 years ago
Boyuan Yao 1f2e547f7a
[fx] Fix ckpt functions' definitions in forward (#1476)
2 years ago
Super Daniel bbc58d881b
[fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466)
2 years ago
Super Daniel e7383f578b
[fx] add rules to linearize computation graphs for searching. (#1461)
2 years ago
Boyuan Yao 092b9c8f49
[fx] Add use_reentrant=False to checkpoint in codegen (#1463)
2 years ago
Jiarui Fang 36824a304c
[Doc] add more doc for ColoTensor. (#1458)
2 years ago
Super Daniel 0dbd61c29b
[fx] fix test and algorithm bugs in activation checkpointing. (#1451)
2 years ago
Jiarui Fang b1553fdf96
[NFC] global vars should be upper case (#1456)
2 years ago
Boyuan Yao 5774fe0270
[fx] Use colossalai checkpoint and add offload recognition in codegen (#1439)
2 years ago
Super Daniel d40a9392ba
[fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446)
2 years ago
Super Daniel 3b26516c69
[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433)
2 years ago
Super Daniel f20cb4e893
[fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425)
2 years ago
Frank Lee 7d6293927f
[fx] patched torch.max and data movement operator (#1391)
2 years ago
Frank Lee 89e60d1505
[fx] fixed indentation error in checkpointing codegen (#1385)
2 years ago
Frank Lee ad678921db
[fx] patched torch.full for huggingface opt (#1386)
2 years ago
YuliangLiu0306 df54481473
[hotfix] fix some bugs during gpt2 testing (#1379)
2 years ago
YuliangLiu0306 52bc2dc271
[fx] update split module pass and add customized policy (#1373)
2 years ago
Super Daniel be229217ce
[fx] add torchaudio test (#1369)
2 years ago
YuliangLiu0306 5542816690
[fx]add gpt2 passes for pipeline performance test (#1366)
2 years ago
Frank Lee cd063ac37f
[fx] added activation checkpoint codegen support for torch < 1.12 (#1359)
2 years ago
Frank Lee 644582eee9
[fx] added activation checkpoint codegen (#1355)
2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351)
2 years ago
Frank Lee 274c1a3b5f
[fx] fixed apex normalization patch exception (#1352)
2 years ago
Frank Lee 05fae1fd56
[fx] added activation checkpointing annotation (#1349)
2 years ago
YuliangLiu0306 051592c64e
[fx] update MetaInforProp pass to process more complex node.meta (#1344)
2 years ago
YuliangLiu0306 942c8cd1fb
[fx] refactor tracer to trace complete graph (#1342)
2 years ago
Frank Lee 2cc1175c76
[fx] tested the complete workflow for auto-parallel (#1336)
2 years ago
YuliangLiu0306 4631fef8a0
[fx]refactor tracer (#1335)
2 years ago
Frank Lee 75abc75c15
[fx] fixed compatiblity issue with torch 1.10 (#1331)
2 years ago
Frank Lee b2475d8c5c
[fx] fixed unit tests for torch 1.12 (#1327)
2 years ago
YuliangLiu0306 e8acf55e8b
[fx] add balanced policy v2 (#1251)
2 years ago
XYE ca2d3f284f
[fx] Add unit test and fix bugs for transform_mlp_pass (#1299)
2 years ago
Frank Lee 4f4d8c3656
[fx] added apex normalization to patched modules (#1300)
2 years ago
Frank Lee fb35460595
[fx] added ndim property to proxy (#1253)
2 years ago
Frank Lee 4a09fc0947
[fx] fixed tracing with apex-based T5 model (#1252)
2 years ago
Frank Lee 7531c6271f
[fx] refactored the file structure of patched function and module (#1238)
2 years ago
YuliangLiu0306 97d713855a
[fx] methods to get fx graph property. (#1246)
2 years ago
YuliangLiu0306 30b4fc0eb0
[fx]add split module pass and unit test from pipeline passes (#1242)
2 years ago
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245)
2 years ago
Jiarui Fang 0e199d71e8
[hotfix] fx get comm size bugs (#1233)
2 years ago
YuliangLiu0306 2b7dca44b5
[fx]get communication size between partitions (#1224)
2 years ago
Frank Lee 84f2298a96
[fx] added patches for tracing swin transformer (#1228)
2 years ago
Frank Lee b6cb5a47ad
[fx] added timm model tracing testing (#1221)
2 years ago
Jiarui Fang db1bef9032
[hotfix] fx shard 1d pass bug fixing (#1220)
2 years ago
Frank Lee 11973d892d
[fx] added torchvision model tracing testing (#1216)
2 years ago
XYE 291e22aac6
[fx] temporarily used (#1215)
2 years ago
Frank Lee 5da87ce35d
[fx] added testing for all albert variants (#1211)
2 years ago
Frank Lee 2d13a45a3b
[fx] added testing for all gpt variants (#1210)
2 years ago
YuliangLiu0306 189946c5c4
[fx]add uniform policy (#1208)
2 years ago
Frank Lee 426a279ce7
[fx] added testing for all bert variants (#1207)
2 years ago
Frank Lee f7878f465c
[fx] supported model tracing for huggingface bert (#1201)
2 years ago
Frank Lee abf6a262dc
[fx] added module patch for pooling layers (#1197)
2 years ago
Frank Lee 2c8c05675d
[fx] patched conv and normalization (#1188)
2 years ago
Frank Lee 6d86f1bc91
[fx] supported data-dependent control flow in model tracing (#1185)
2 years ago
YuliangLiu0306 fcf55777dd
[fx]add autoparallel passes (#1121)
3 years ago