digger yu
7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. ( #3808 )
2023-05-24 09:01:50 +08:00
digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc( #3618 )
...
* Fixed several spelling errors under colossalai
* Fix the spelling error in colossalai and docs directory
* Cautious Changed the spelling error under the example folder
* Update runtime_preparation_pass.py
revert autograft to autograd
* Update search_chunk.py
utile to until
* Update check_installation.py
change misteach to mismatch in line 91
* Update 1D_tensor_parallel.md
revert to perceptron
* Update 2D_tensor_parallel.md
revert to perceptron in line 73
* Update 2p5D_tensor_parallel.md
revert to perceptron in line 71
* Update 3D_tensor_parallel.md
revert to perceptron in line 80
* Update README.md
revert to resnet in line 42
* Update reorder_graph.py
revert to indice in line 7
* Update p2p.py
revert to megatron in line 94
* Update initialize.py
revert to torchrun in line 198
* Update routers.py
change to detailed in line 63
* Update routers.py
change to detailed in line 146
* Update README.md
revert random number in line 402
2023-04-26 11:38:43 +08:00
YuliangLiu0306
fee2af8610
[autoparallel] adapt autoparallel with new analyzer ( #3261 )
...
* [autoparallel] adapt autoparallel with new analyzer
* fix all node handler tests
* polish
* polish
2023-03-30 17:47:24 +08:00
YuliangLiu0306
fbd2a9e05b
[hotfix] meta_tensor_compatibility_with_torch2
2023-03-30 13:43:01 +08:00
Michelle
ad285e1656
[NFC] polish colossalai/fx/tracer/_tracer_utils.py ( #3323 )
...
* [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style
* [NFC] polish colossalai/fx/tracer/_tracer_utils.py code style
---------
Co-authored-by: Qianran Ma <qianranm@luchentech.com>
2023-03-29 17:53:32 +08:00
Xuanlei Zhao
6b3bb2c249
[NFC] polish code style ( #3273 )
2023-03-29 15:22:21 +08:00
CZYCW
4cadb25b96
[NFC] policy colossalai/fx/proxy.py code style ( #3269 )
2023-03-29 15:22:21 +08:00
CsRic
00778abc48
[NFC] polish colossalai/fx/passes/split_module.py code style ( #3263 )
...
Co-authored-by: csric <richcsr256@gmail.com>
2023-03-29 15:22:21 +08:00
dayellow
204ca2f09a
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style ( #3256 )
...
Co-authored-by: Minghao Huang <huangminghao@luchentech.com>
2023-03-29 15:22:21 +08:00
HELSON
02b058032d
[fx] meta registration compatibility ( #3253 )
...
* [fx] meta registration compatibility
* fix error
2023-03-27 15:22:17 +08:00
Super Daniel
fff98f06ed
[analyzer] a minimal implementation of static graph analyzer ( #2852 )
...
* [hotfix] meta tensor default device.
* [siu] add experimental submodules to main branch.
* [siu]
* [siu]
* [analyzer] init.
* [analyzer] readme.
* [analyzer] readme.
* [analyzer] readme.
* [analyzer] readme.
* [test] add test.
* Update symbolic_trace.py
* mark skip tests.
* try except.
* try except.
* try except.
* s
* init
* init
* fix
* skip
* skip
---------
Co-authored-by: Daniel Shao <superdainiu@MININT-PVARVID.fareast.corp.microsoft.com>
Co-authored-by: Daniel Shao <superdainiu@Daniels-Mac.local>
2023-03-10 13:21:05 +08:00
Xuanlei Zhao
10c61de2f7
[autochunk] support vit ( #3084 )
...
support vit for autochunk
* support some new ops for vit
* fix some bugs
* add test for vit
2023-03-10 10:23:26 +08:00
Ziyue Jiang
400f63012e
[pipeline] Add Simplified Alpa DP Partition ( #2507 )
...
* add alpa dp split
* add alpa dp split
* use fwd+bwd instead of fwd only
---------
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-03-07 10:34:31 +08:00
Super Daniel
b42d3d28ed
[fx] remove depreciated algorithms. ( #2312 ) ( #2313 )
2023-03-07 10:30:35 +08:00
ver217
823f3b9cf4
[doc] add deepspeed citation and copyright ( #2996 )
...
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
2023-03-04 20:08:11 +08:00
Boyuan Yao
90a9fdd91d
[autoparallel] Patch meta information of `torch.matmul` ( #2584 )
...
* [autoparallel] matmul metainfo
* [auto_parallel] remove unused print
* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2023-02-08 11:05:31 +08:00
oahzxl
fa3d66feb9
support unet metainfo prop ( #2544 )
2023-02-02 16:19:26 +08:00
Super Daniel
c198c7c0b0
[hotfix] meta tensor default device. ( #2510 )
2023-01-29 16:28:10 +08:00
Super Daniel
35c0c0006e
[utils] lazy init. ( #2148 )
...
* [utils] lazy init.
* [utils] remove description.
* [utils] complete.
* [utils] finalize.
* [utils] fix names.
2023-01-20 10:49:00 +08:00
Ziyue Jiang
0f02b8c6e6
add avg partition ( #2483 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-19 13:54:50 +08:00
oahzxl
5db3a5bf42
[fx] allow control of ckpt_codegen init ( #2498 )
...
* [fx] allow control of ckpt_codegen init
Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so.
So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__.
* code style
2023-01-18 17:02:46 +08:00
oahzxl
4953b4ace1
[autochunk] support evoformer tracer ( #2485 )
...
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2023-01-16 19:25:05 +08:00
Super Daniel
c41e59e5ad
[fx] allow native ckpt trace and codegen. ( #2438 )
2023-01-11 13:49:59 +08:00
Zihao
3a02b46447
[auto-parallel] refactoring ColoTracer ( #2118 )
...
* add meta_data_computing
* add checkpoint_annotation
* rename proxy.data to proxy.meta_data and add bias addition pass
* polish code
* delete meta_prop_pass invoke and rename ori_node to orig_node
* add TracerType
* unify meta data computing
* delete TracerType
* handle setitem operation
* operator.setitem
2023-01-04 14:44:22 +08:00
Boyuan Yao
5c2ef9fc76
[autoparallel] modify comm nodes' memory cost in construct chain ( #2263 )
...
* [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline
* [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop
* [autoparallel] specifycomm nodes' memory cost in construct chain
2023-01-03 11:38:48 +08:00
YuliangLiu0306
3b1b91eaf4
[autoparallel] record parameter attribute in colotracer ( #2217 )
...
* [autoparallel] record parameter attribute in collotracer
* [autoparallel] fix construct_meta_info bug
2022-12-28 19:29:08 +08:00
Ziyue Jiang
59e343328d
[Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage ( #2156 )
...
* add splitter
* polish code
* remove comment
* fix async nan by moving to cpu first
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-23 11:38:43 +08:00
Zihao
12e7bcd720
register meta func for rnn ( #2159 )
2022-12-21 23:06:18 +08:00
YuliangLiu0306
16335cb537
[hotfix] fix aten default bug ( #2158 )
2022-12-20 22:40:46 +08:00
Zihao
a128eec9d5
register aten._convolution.default ( #2137 )
2022-12-18 19:27:01 +08:00
YuliangLiu0306
d87baa85d9
[autoparallel] support linear function bias addition ( #2104 )
2022-12-09 10:31:36 +08:00
YuliangLiu0306
0fecbb9e20
[autoparallel] support addbmm computation ( #2102 )
2022-12-08 21:15:11 +08:00
YuliangLiu0306
b175e6d58e
[autoparallel] add bias addtion function class ( #2098 )
...
* [autoparallel] add bias addtion function class
* polish code
* polish
2022-12-08 11:31:51 +08:00
Super Daniel
2bf2d1cd3b
[fx] An experimental version of ColoTracer.' ( #2002 )
...
* [fx] add a symbolic_trace api.
* [fx] fix import errors.
* [fx] ColoTracer experimental.
2022-12-07 18:36:17 +08:00
Ziyue Jiang
44ea461890
[Pipeline] Add Topo Class ( #2059 )
...
* use Topo class to rewrite DAG
* polish code
* polish code
* polish code
* add comment
* add else to unended if
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-12-02 18:13:20 +08:00
Ziyue Jiang
b0936e4a44
[rpc] split with dag ( #2028 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
* add DAG middleware in scheduler
* add test case for scheduler
* remove break
* recover old lifecycle
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-29 11:36:28 +08:00
Ziyue Jiang
632753abbc
[fx]Split partition with DAG information ( #2025 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-25 17:42:48 +08:00
Super Daniel
2edbef13cc
[fx] add more meta_registry for MetaTensor execution. ( #2000 )
...
* [sc] add examples for auto checkpoint.
* merge upstream
* [fx] add more meta_registry for MetaTensor execution.
2022-11-23 10:55:46 +08:00
YuliangLiu0306
fea3cb661c
[autoparallel] support addmm in tracer and solver ( #1961 )
...
* [fx] patch addmm
* [autoparallel] support addmm in tracer and solver
2022-11-16 14:59:18 +08:00
Boyuan Yao
d5c5bc219e
[SC] add GPT example for auto checkpoint ( #1889 )
...
* [sc] SC tutorial for auto checkpoint
* [sc] polish examples
* [sc] polish readme
* [sc] polish readme and help information
* [sc] polish readme and help information
2022-11-11 23:17:25 +08:00
Super Daniel
448248b27c
[fx] metainfo_trace as an API. ( #1873 )
...
* [fx] metainfo_trace as an API.
* [fx] add return.
2022-11-10 20:58:37 +08:00
YuliangLiu0306
f6032ddb17
[autoparallel] fix bias addition module ( #1800 )
2022-11-08 16:21:25 +08:00
Super Daniel
441d584e4a
[fx] add a symbolic_trace api. ( #1812 )
...
* [fx] add a symbolic_trace api.
* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
YuliangLiu0306
e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands ( #1793 )
2022-11-04 18:36:42 +08:00
Boyuan Yao
05ce3d369f
[fx] Add linear metainfo class for auto parallel ( #1783 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
Super Daniel
e8a9bebc87
[autoparallel] refactor and add rotorc. ( #1789 )
...
* [autoparallel] refactor and add rotorc.
* [autoparallel] refactor and add rotorc.
2022-11-03 12:32:51 +08:00
YuliangLiu0306
2c4c7b3618
[autoparallel] add getattr handler ( #1767 )
...
* [autoparallel] add getattr haandler
* polish code
* add extra processes for Parameters
* add unit test for param resharding cost
* add docstring and polish test
2022-11-03 12:31:33 +08:00
YuliangLiu0306
e859380bf7
[fx] support module with bias addition ( #1780 )
...
* [autoparallel] refactor tracer to fix bias addition issue
* [fx] support module with bias addition
* create bias_addition_module
* refactor file structure
* polish code
* fix unit test
2022-11-01 22:53:51 +08:00
Super Daniel
1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code ( #1764 )
...
* [autoparallel] first move.
* [autoparallel] add solver rotor.
* [autoparallel] add ckpt solvers.
* [autoparallel] modify codegen.
* [fx] fix annotation in test.
* [fx] remove check.
* [autoparallel] polish docstring.
* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
Super Daniel
0584654c79
[fx] refactor memory utils and extend shard utils. ( #1754 )
...
* [fx] change memory.py to memory_utils.py.
* [fx] add shard utils.
* [fx] fix import.
* [fx] check code style.
* [fx] add comment.
* [autoparallel] first move.
* [fx] add time computations.
2022-10-26 14:24:41 +08:00