Edenzzzz
d83c633ca6
[hotfix] Fix examples no pad token & auto parallel codegen bug; ( #5606 )
...
* fix no pad token bug
* fixed some auto parallel codegen bug, but might not run on torch 2.1
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
7 months ago
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
1 year ago
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
1 year ago
digger yu
70c8cdecf4
[nfc] fix typo colossalai/cli fx kernel ( #3847 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
* fix typo colossalai/ applications/
* fix typo colossalai/cli fx kernel
2 years ago
digger yu
7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. ( #3808 )
2 years ago
digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc( #3618 )
...
* Fixed several spelling errors under colossalai
* Fix the spelling error in colossalai and docs directory
* Cautious Changed the spelling error under the example folder
* Update runtime_preparation_pass.py
revert autograft to autograd
* Update search_chunk.py
utile to until
* Update check_installation.py
change misteach to mismatch in line 91
* Update 1D_tensor_parallel.md
revert to perceptron
* Update 2D_tensor_parallel.md
revert to perceptron in line 73
* Update 2p5D_tensor_parallel.md
revert to perceptron in line 71
* Update 3D_tensor_parallel.md
revert to perceptron in line 80
* Update README.md
revert to resnet in line 42
* Update reorder_graph.py
revert to indice in line 7
* Update p2p.py
revert to megatron in line 94
* Update initialize.py
revert to torchrun in line 198
* Update routers.py
change to detailed in line 63
* Update routers.py
change to detailed in line 146
* Update README.md
revert random number in line 402
2 years ago
YuliangLiu0306
fee2af8610
[autoparallel] adapt autoparallel with new analyzer ( #3261 )
...
* [autoparallel] adapt autoparallel with new analyzer
* fix all node handler tests
* polish
* polish
2 years ago
YuliangLiu0306
fbd2a9e05b
[hotfix] meta_tensor_compatibility_with_torch2
2 years ago
Michelle
ad285e1656
[NFC] polish colossalai/fx/tracer/_tracer_utils.py ( #3323 )
...
* [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style
* [NFC] polish colossalai/fx/tracer/_tracer_utils.py code style
---------
Co-authored-by: Qianran Ma <qianranm@luchentech.com>
2 years ago
Xuanlei Zhao
6b3bb2c249
[NFC] polish code style ( #3273 )
2 years ago
CZYCW
4cadb25b96
[NFC] policy colossalai/fx/proxy.py code style ( #3269 )
2 years ago
CsRic
00778abc48
[NFC] polish colossalai/fx/passes/split_module.py code style ( #3263 )
...
Co-authored-by: csric <richcsr256@gmail.com>
2 years ago
dayellow
204ca2f09a
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style ( #3256 )
...
Co-authored-by: Minghao Huang <huangminghao@luchentech.com>
2 years ago
HELSON
02b058032d
[fx] meta registration compatibility ( #3253 )
...
* [fx] meta registration compatibility
* fix error
2 years ago
Super Daniel
fff98f06ed
[analyzer] a minimal implementation of static graph analyzer ( #2852 )
...
* [hotfix] meta tensor default device.
* [siu] add experimental submodules to main branch.
* [siu]
* [siu]
* [analyzer] init.
* [analyzer] readme.
* [analyzer] readme.
* [analyzer] readme.
* [analyzer] readme.
* [test] add test.
* Update symbolic_trace.py
* mark skip tests.
* try except.
* try except.
* try except.
* s
* init
* init
* fix
* skip
* skip
---------
Co-authored-by: Daniel Shao <superdainiu@MININT-PVARVID.fareast.corp.microsoft.com>
Co-authored-by: Daniel Shao <superdainiu@Daniels-Mac.local>
2 years ago
Xuanlei Zhao
10c61de2f7
[autochunk] support vit ( #3084 )
...
support vit for autochunk
* support some new ops for vit
* fix some bugs
* add test for vit
2 years ago
Ziyue Jiang
400f63012e
[pipeline] Add Simplified Alpa DP Partition ( #2507 )
...
* add alpa dp split
* add alpa dp split
* use fwd+bwd instead of fwd only
---------
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
Super Daniel
b42d3d28ed
[fx] remove depreciated algorithms. ( #2312 ) ( #2313 )
2 years ago
ver217
823f3b9cf4
[doc] add deepspeed citation and copyright ( #2996 )
...
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
2 years ago
Boyuan Yao
90a9fdd91d
[autoparallel] Patch meta information of `torch.matmul` ( #2584 )
...
* [autoparallel] matmul metainfo
* [auto_parallel] remove unused print
* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2 years ago
oahzxl
fa3d66feb9
support unet metainfo prop ( #2544 )
2 years ago
Super Daniel
c198c7c0b0
[hotfix] meta tensor default device. ( #2510 )
2 years ago
Super Daniel
35c0c0006e
[utils] lazy init. ( #2148 )
...
* [utils] lazy init.
* [utils] remove description.
* [utils] complete.
* [utils] finalize.
* [utils] fix names.
2 years ago
Ziyue Jiang
0f02b8c6e6
add avg partition ( #2483 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
oahzxl
5db3a5bf42
[fx] allow control of ckpt_codegen init ( #2498 )
...
* [fx] allow control of ckpt_codegen init
Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so.
So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__.
* code style
2 years ago
oahzxl
4953b4ace1
[autochunk] support evoformer tracer ( #2485 )
...
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2 years ago
Super Daniel
c41e59e5ad
[fx] allow native ckpt trace and codegen. ( #2438 )
2 years ago
Zihao
3a02b46447
[auto-parallel] refactoring ColoTracer ( #2118 )
...
* add meta_data_computing
* add checkpoint_annotation
* rename proxy.data to proxy.meta_data and add bias addition pass
* polish code
* delete meta_prop_pass invoke and rename ori_node to orig_node
* add TracerType
* unify meta data computing
* delete TracerType
* handle setitem operation
* operator.setitem
2 years ago
Boyuan Yao
5c2ef9fc76
[autoparallel] modify comm nodes' memory cost in construct chain ( #2263 )
...
* [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline
* [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop
* [autoparallel] specifycomm nodes' memory cost in construct chain
2 years ago
YuliangLiu0306
3b1b91eaf4
[autoparallel] record parameter attribute in colotracer ( #2217 )
...
* [autoparallel] record parameter attribute in collotracer
* [autoparallel] fix construct_meta_info bug
2 years ago
Ziyue Jiang
59e343328d
[Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage ( #2156 )
...
* add splitter
* polish code
* remove comment
* fix async nan by moving to cpu first
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
Zihao
12e7bcd720
register meta func for rnn ( #2159 )
2 years ago
YuliangLiu0306
16335cb537
[hotfix] fix aten default bug ( #2158 )
2 years ago
Zihao
a128eec9d5
register aten._convolution.default ( #2137 )
2 years ago
YuliangLiu0306
d87baa85d9
[autoparallel] support linear function bias addition ( #2104 )
2 years ago
YuliangLiu0306
0fecbb9e20
[autoparallel] support addbmm computation ( #2102 )
2 years ago
YuliangLiu0306
b175e6d58e
[autoparallel] add bias addtion function class ( #2098 )
...
* [autoparallel] add bias addtion function class
* polish code
* polish
2 years ago
Super Daniel
2bf2d1cd3b
[fx] An experimental version of ColoTracer.' ( #2002 )
...
* [fx] add a symbolic_trace api.
* [fx] fix import errors.
* [fx] ColoTracer experimental.
2 years ago
Ziyue Jiang
44ea461890
[Pipeline] Add Topo Class ( #2059 )
...
* use Topo class to rewrite DAG
* polish code
* polish code
* polish code
* add comment
* add else to unended if
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
Ziyue Jiang
b0936e4a44
[rpc] split with dag ( #2028 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
* add DAG middleware in scheduler
* add test case for scheduler
* remove break
* recover old lifecycle
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
Ziyue Jiang
632753abbc
[fx]Split partition with DAG information ( #2025 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
Super Daniel
2edbef13cc
[fx] add more meta_registry for MetaTensor execution. ( #2000 )
...
* [sc] add examples for auto checkpoint.
* merge upstream
* [fx] add more meta_registry for MetaTensor execution.
2 years ago
YuliangLiu0306
fea3cb661c
[autoparallel] support addmm in tracer and solver ( #1961 )
...
* [fx] patch addmm
* [autoparallel] support addmm in tracer and solver
2 years ago
Boyuan Yao
d5c5bc219e
[SC] add GPT example for auto checkpoint ( #1889 )
...
* [sc] SC tutorial for auto checkpoint
* [sc] polish examples
* [sc] polish readme
* [sc] polish readme and help information
* [sc] polish readme and help information
2 years ago
Super Daniel
448248b27c
[fx] metainfo_trace as an API. ( #1873 )
...
* [fx] metainfo_trace as an API.
* [fx] add return.
2 years ago
YuliangLiu0306
f6032ddb17
[autoparallel] fix bias addition module ( #1800 )
2 years ago
Super Daniel
441d584e4a
[fx] add a symbolic_trace api. ( #1812 )
...
* [fx] add a symbolic_trace api.
* [fx] fix import errors.
2 years ago
YuliangLiu0306
e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands ( #1793 )
2 years ago
Boyuan Yao
05ce3d369f
[fx] Add linear metainfo class for auto parallel ( #1783 )
...
* [fx] metainfo class for auto parallel
* [fx] add unit test for linear metainfo
* [fx] fix bwd param for linear
* [fx] modify unit test
* [fx] modify unit test
* [fx] modify import
* [fx] modify import
* [fx] modify import
* [fx] move meta profiler to auto parallel
2 years ago
Super Daniel
e8a9bebc87
[autoparallel] refactor and add rotorc. ( #1789 )
...
* [autoparallel] refactor and add rotorc.
* [autoparallel] refactor and add rotorc.
2 years ago