162 Commits (30a94431323d71c5ef06bd4b7f047aced3312fdf)

Author SHA1 Message Date
Boyuan Yao 31fffd3fc5
[fx] fix wrong variable name in solver rotor (#1502) 2 years ago
Boyuan Yao de1e716dc4
[fx] Add activation checkpoint solver rotor (#1496) 2 years ago
Super Daniel 09c023bee2
[fx] add more op patches for profiler and error message for unsupported ops. (#1495) 2 years ago
YuliangLiu0306 413c053453
[autoparallel] add cost graph class (#1481) 2 years ago
Frank Lee 3da68d6b1b
[fx] fixed adapative pooling size concatenation error (#1489) 2 years ago
Super Daniel 32efe8e740
[fx] add profiler for fx nodes. (#1480) 2 years ago
Boyuan Yao 1f2e547f7a
[fx] Fix ckpt functions' definitions in forward (#1476) 2 years ago
Super Daniel bbc58d881b
[fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466) 2 years ago
Super Daniel e7383f578b
[fx] add rules to linearize computation graphs for searching. (#1461) 2 years ago
Boyuan Yao 092b9c8f49
[fx] Add use_reentrant=False to checkpoint in codegen (#1463) 2 years ago
Jiarui Fang 36824a304c
[Doc] add more doc for ColoTensor. (#1458) 2 years ago
Super Daniel 0dbd61c29b
[fx] fix test and algorithm bugs in activation checkpointing. (#1451) 2 years ago
Jiarui Fang b1553fdf96
[NFC] global vars should be upper case (#1456) 2 years ago
Boyuan Yao 5774fe0270
[fx] Use colossalai checkpoint and add offload recognition in codegen (#1439) 2 years ago
Super Daniel d40a9392ba
[fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446) 2 years ago
Super Daniel 3b26516c69
[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) 2 years ago
Super Daniel f20cb4e893
[fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) 2 years ago
Frank Lee 7d6293927f
[fx] patched torch.max and data movement operator (#1391) 2 years ago
Frank Lee 89e60d1505
[fx] fixed indentation error in checkpointing codegen (#1385) 2 years ago
Frank Lee ad678921db
[fx] patched torch.full for huggingface opt (#1386) 2 years ago
YuliangLiu0306 df54481473
[hotfix] fix some bugs during gpt2 testing (#1379) 2 years ago
YuliangLiu0306 52bc2dc271
[fx] update split module pass and add customized policy (#1373) 2 years ago
Super Daniel be229217ce
[fx] add torchaudio test (#1369) 2 years ago
YuliangLiu0306 5542816690
[fx]add gpt2 passes for pipeline performance test (#1366) 2 years ago
Frank Lee cd063ac37f
[fx] added activation checkpoint codegen support for torch < 1.12 (#1359) 2 years ago
Frank Lee 644582eee9
[fx] added activation checkpoint codegen (#1355) 2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351) 2 years ago
Frank Lee 274c1a3b5f
[fx] fixed apex normalization patch exception (#1352) 2 years ago
Frank Lee 05fae1fd56
[fx] added activation checkpointing annotation (#1349) 2 years ago
YuliangLiu0306 051592c64e
[fx] update MetaInforProp pass to process more complex node.meta (#1344) 2 years ago
YuliangLiu0306 942c8cd1fb
[fx] refactor tracer to trace complete graph (#1342) 2 years ago
Frank Lee 2cc1175c76
[fx] tested the complete workflow for auto-parallel (#1336) 2 years ago
YuliangLiu0306 4631fef8a0
[fx]refactor tracer (#1335) 2 years ago
Frank Lee 75abc75c15
[fx] fixed compatiblity issue with torch 1.10 (#1331) 2 years ago
Frank Lee b2475d8c5c
[fx] fixed unit tests for torch 1.12 (#1327) 2 years ago
YuliangLiu0306 e8acf55e8b
[fx] add balanced policy v2 (#1251) 2 years ago
XYE ca2d3f284f
[fx] Add unit test and fix bugs for transform_mlp_pass (#1299) 2 years ago
Frank Lee 4f4d8c3656
[fx] added apex normalization to patched modules (#1300) 2 years ago
Frank Lee fb35460595
[fx] added ndim property to proxy (#1253) 2 years ago
Frank Lee 4a09fc0947
[fx] fixed tracing with apex-based T5 model (#1252) 2 years ago
Frank Lee 7531c6271f
[fx] refactored the file structure of patched function and module (#1238) 2 years ago
YuliangLiu0306 97d713855a
[fx] methods to get fx graph property. (#1246) 2 years ago
YuliangLiu0306 30b4fc0eb0
[fx]add split module pass and unit test from pipeline passes (#1242) 2 years ago
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245) 2 years ago
Jiarui Fang 0e199d71e8
[hotfix] fx get comm size bugs (#1233) 2 years ago
YuliangLiu0306 2b7dca44b5
[fx]get communication size between partitions (#1224) 2 years ago
Frank Lee 84f2298a96
[fx] added patches for tracing swin transformer (#1228) 2 years ago
Frank Lee b6cb5a47ad
[fx] added timm model tracing testing (#1221) 2 years ago
Jiarui Fang db1bef9032
[hotfix] fx shard 1d pass bug fixing (#1220) 2 years ago
Frank Lee 11973d892d
[fx] added torchvision model tracing testing (#1216) 2 years ago