* [autoparallel] add getattr haandler
* polish code
* add extra processes for Parameters
* add unit test for param resharding cost
* add docstring and polish test
* [fx/profiling] provide summary for MetaInfoProp.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx] optimize table repr.
* [fx] optimize table repr.
* [fx] refactor code for profiler.
* [fx] add docstring.
* [fx] add docstring.
* [fx] skip test.
* [fx] redo.
* [fx] redo.
* [fx] fix import error for torch11.
* [fx] fix import error for torch11.
* [fx] add input activation offload to codegen
* [fx] modify unit test
* [fx] remove two skips in torch11
* [fx] use all_input_nodes instead of _input_nodes
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
* [fx] fix variable names in ckpt_solvers and unskip tests.
* [fx] commit my changes.
* [fx] restore skips.
* [fx] restore skips.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
* [fx] compute memory stat and flop count for MetaInfoProp.
* [fx] modify node attribute.
* [fx] modify ckpt_chen.
* [fx] fix compatibility.
* [fx] fix import error.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip if torch 1.11.0.
* [fx] recover MetaInfoProp support for PyTorch 1.11.
* [fx] provide a stable but not accurate enough version of profiler.
* [fx] provide a stable but not accurate enough version of profiler.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix import error.
* [fx] support meta tracing for aten level computation graphs like functorch.
* [fx] support meta tracing for aten level computation graphs like functorch.
* [fx] remove redundant import.
* [fx] add docstring.
* [fx] fix wrong variable name in solver rotor
* [fx] fix wrong variable name in solver rotor
* [fx] fix the discretize bug
* [fx] fix the first op in activation checkpoint codegen
* [fx] fix some bugs of ckpt solver
* [fx] modify test_ckpt_torchvision
* [fx] set sequence to __sequence__ attr of GraphModule
* [fx] docstring modification
* [fx] remove performance test
* [fx] hack __torch_dispatch__ for meta tensor and autograd.
* [fx] hack __torch_dispatch__ for meta tensor and autograd.
* [fx] hack __torch_dispatch__ for meta tensor and autograd.
* [fx] hack __torch_dispatch__ for meta tensor and autograd.
* [fx] hack __torch_dispatch__ for meta tensor and autograd.
* [fx] add bad case detections.
* [fx] add bad case detections.
* [fx] rename MetaTensor attributes.
* [fx] fix unexpected error.
* [fx] fix unexpected error.
* [fx] fix unexpected error.
* [fx] fix unexpected error.
* [fx] fix unexpected error.
* [fx] add register backward for native_batch_norm_backward.
* [fx] add more meta backend support for nn.Modules.
* [fx] add meta backend to support timm and torchvision models.
* [fx] add meta hardswish for timm models.
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] add rules to linearize computation graphs for searching. (#2)
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] fix test and algorithm bugs in activation checkpointing.
* [fx] polish ckpt_test.
* [fx] add rules to linearize computation graphs for searching.
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] fix inconsistencies.
* [fx] fix MetaInfoProp.
* [fx] fix MetaInfoProp.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] fix error in tests.
* [fx] unfix bug.
* [fx] unfix bug.
* [fx] patch more modules and functions.
* [fx] change name of utils.py to profiler.py
* [fx] add profiler for rnn.
* [fx] add profiler for rnn.
* [fx] polish and add more patch for profiler.
* [fx] polish and add more patch for profiler.
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] add rules to linearize computation graphs for searching. (#2)
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] fix test and algorithm bugs in activation checkpointing.
* [fx] polish ckpt_test.
* [fx] add rules to linearize computation graphs for searching.
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] fix inconsistencies.
* [fx] fix MetaInfoProp.
* [fx] fix MetaInfoProp.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] add profiler for fx nodes.
* [fx] fix error in tests.
* [fx] unfix bug.
* [fx] unfix bug.
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] add rules to linearize computation graphs for searching. (#2)
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] fix test and algorithm bugs in activation checkpointing.
* [fx] polish ckpt_test.
* [fx] add rules to linearize computation graphs for searching.
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] remove chen_sqrt for sake of simplicity
* [fx] fix inconsistencies.
* [fx] fix MetaInfoProp.
* [fx] fix MetaInfoProp.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [fx] consider MetaInfoProp for inplace operands.
* [utils] Add use_reetrant=False into colossalai checkpoint
* [utils] add some annotation in utils.activaion_checkpoint
* [test] add reset_seed at the beginning of tests in test_actiavion_checkpointing.py
* [test] modify test_activation_checkpoint.py
* [test] modify test for reentrant=False
* [fx] Add use_reentrant=False of checkpoint into codegen
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] merge development into main (#1)
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] simplify test for ckpt.
* [fx] fix test and algorithm bugs in activation checkpointing.
* mend
[fx] fix test and algorithm bugs in activation checkpointing.
* mend
[fx] fix test and algorithm bugs in activation checkpointing.
* mend
[fx] fix test and algorithm bugs in activation checkpointing.
* mend
[fx] fix test and algorithm bugs in activation checkpointing.
* [fx] polish ckpt_test.
* [fx] polish ckpt_test.
* [fx] polish ckpt_test.
* [fx] Use colossalai.utils.checkpoint to replace torch.utils.checkpoint for offload activation and add offload annotation recognition in codegen
* [fx] Use colossalai.utils.checkpoint to replace torch.utils.checkpoint for offload activation and add offload annotation recognition in codegen
* Modification of test and add TODO in codegen
* [fx] Modification of colossal ckpt usage
* [fx] add gpc.destroy() to test_codegen
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* mend
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
* [fx] fix lowercase naming conventions.
* [fx] activation checkpointing using Chen strategies.
* [fx] add test for ckpt_solver_chen
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add vanilla activation checkpoint search with test on resnet and densenet
* [fx] add a namespace code for solver_chen.
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
* [CLI] add CLI launcher
* Revert "[CLI] add CLI launcher"
This reverts commit df7e6506d4.
* manipulation
* [fx]add graph manipulation methods.
* [fx]methods to get fx graph property.
* add unit test
* add docstring to explain top node and leaf node in this context
* init a checkpoint dir
* [checkpoint]support resume for cosinewarmuplr
* [checkpoint]add unit test
* fix some bugs but still not OK
* fix bugs
* make it faster
* [checkpoint]support generalized scheduler
* polish
* [tensor] torch function return colotensor
* polish
* fix bugs
* remove debug info
* polish
* polish
* [tensor] test_model pass unittests
* polish
* [hotfix] fx get comm size bug
Co-authored-by: ZhaoYi1222 <zhaoyi9499@gmail.com>