Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
1 year ago
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
1 year ago
Hongxin Liu
172f7fa3cf
[misc] resolve code factor issues ( #4433 )
1 year ago
Jianghai
d9be0472ef
[bugs] hot fix some testing bugs for new models ( #4268 )
...
* hot fix
* hot fx tracer
1 year ago
jiangmingyan
ac80937138
[shardformer] shardformer support opt models ( #4091 )
...
* [shardformer] shardformer support opt models
* [shardformer] shardformer support opt models, fix
* [shardformer] shardformer support opt models, fix
* [shardformer] shardformer support opt models, fix
1 year ago
Frank Lee
c4b1b65931
[test] fixed tests failed due to dtensor change ( #4082 )
...
* [test] fixed tests failed due to dtensor change
* polish code
1 year ago
Frank Lee
58df720570
[shardformer] adapted T5 and LLaMa test to use kit ( #4049 )
...
* [shardformer] adapted T5 and LLaMa test to use kit
* polish code
1 year ago
Hongxin Liu
afb239bbf8
[devops] update torch version of CI ( #3725 )
...
* [test] fix flop tensor test
* [test] fix autochunk test
* [test] fix lazyinit test
* [devops] update torch version of CI
* [devops] enable testmon
* [devops] fix ci
* [devops] fix ci
* [test] fix checkpoint io test
* [test] fix cluster test
* [test] fix timm test
* [devops] fix ci
* [devops] fix ci
* [devops] fix ci
* [devops] fix ci
* [devops] force sync to test ci
* [test] skip fsdp test
2 years ago
digger-yu
1f73609adb
[CI] fix typo with tests/ etc. ( #3727 )
...
* fix spelling error with examples/comminity/
* fix spelling error with tests/
* fix some spelling error with tests/ colossalai/ etc.
* fix spelling error with tests/ etc. date:2023.5.10
2 years ago
Frank Lee
80eba05b0a
[test] refactor tests with spawn ( #3452 )
...
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
YuliangLiu0306
fee2af8610
[autoparallel] adapt autoparallel with new analyzer ( #3261 )
...
* [autoparallel] adapt autoparallel with new analyzer
* fix all node handler tests
* polish
* polish
2 years ago
YuliangLiu0306
045afa3ea2
[hotfix] skip torchaudio tracing test ( #3211 )
...
* [hotfix] skip torchaudio tracing test
* fix lazy init test issue
2 years ago
YuliangLiu0306
f57d34958b
[FX] refactor experimental tracer and adapt it with hf models ( #3157 )
...
* pass gpt trace and meta_prop
* pass t5 trace and meta_prop
* [FX] refactor experimental tracer and adapt it with hf models
* pass all mainstream model zoo
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* skip tests
* fix CI
* using packaging version
* polish
2 years ago
Frank Lee
085e7f4eff
[test] fixed torchrec registration in model zoo ( #3177 )
...
* [test] fixed torchrec registration in model zoo
* polish code
* polish code
* polish code
2 years ago
Frank Lee
a9b8402d93
[booster] added the accelerator implementation ( #3159 )
2 years ago
Frank Lee
1ad3a636b1
[test] fixed torchrec model test ( #3167 )
...
* [test] fixed torchrec model test
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
YuliangLiu0306
ecd643f1e4
[test] add torchrec models to test model zoo ( #3139 )
2 years ago
ver217
14a115000b
[tests] model zoo add torchaudio models ( #3138 )
...
* [tests] model zoo add torchaudio models
* [tests] refactor torchaudio wavernn
* [tests] refactor fx torchaudio tests
2 years ago
Frank Lee
6d48eb0560
[test] added transformers models to test model zoo ( #3135 )
2 years ago
Frank Lee
a674c63348
[test] added torchvision models to test model zoo ( #3132 )
...
* [test] added torchvision models to test model zoo
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
HELSON
1216d1e7bd
[tests] diffuser models in model zoo ( #3136 )
...
* [tests] diffuser models in model zoo
* remove useless code
* [tests] add diffusers to requirement-test
2 years ago
Frank Lee
86ac782d7c
[test] added timm models to test model zoo ( #3129 )
...
* [test] added timm models to test model zoo
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
YuliangLiu0306
4269196c79
[hotfix] skip auto checkpointing tests ( #3029 )
...
* [hotfix] skip auto checkpointing tests
* fix test name issue
2 years ago
Ziyue Jiang
44ea461890
[Pipeline] Add Topo Class ( #2059 )
...
* use Topo class to rewrite DAG
* polish code
* polish code
* polish code
* add comment
* add else to unended if
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
YuliangLiu0306
19438ea0ef
[hotfix] skip gpt tracing test ( #2064 )
2 years ago
Ziyue Jiang
632753abbc
[fx]Split partition with DAG information ( #2025 )
...
* add DAG to split_module
* add comment
* add test case for DAG
* remove print
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
Jiarui Fang
51597f6a28
[hotfix] pass test_complete_workflow ( #1877 )
2 years ago
Jiarui Fang
c2947dadf1
[inference] streaming Linear 1D Row inference ( #1874 )
2 years ago
Frank Lee
e6ec99d389
[utils] fixed lazy init context ( #1867 )
2 years ago
Super Daniel
441d584e4a
[fx] add a symbolic_trace api. ( #1812 )
...
* [fx] add a symbolic_trace api.
* [fx] fix import errors.
2 years ago
Jiarui Fang
6fa71d65d3
[fx] skip diffusers unitest if it is not installed ( #1799 )
2 years ago
YuliangLiu0306
e34e850a4c
[autoparallel]add essential CommActions for broadcast oprands ( #1793 )
2 years ago
Jiarui Fang
32c1b843a9
skip torchrec unittests if not installed ( #1790 )
2 years ago
YuliangLiu0306
e859380bf7
[fx] support module with bias addition ( #1780 )
...
* [autoparallel] refactor tracer to fix bias addition issue
* [fx] support module with bias addition
* create bias_addition_module
* refactor file structure
* polish code
* fix unit test
2 years ago
Super Daniel
1e88811c7a
[autoparallel] move ckpt solvers to autoparallel folder / refactor code ( #1764 )
...
* [autoparallel] first move.
* [autoparallel] add solver rotor.
* [autoparallel] add ckpt solvers.
* [autoparallel] modify codegen.
* [fx] fix annotation in test.
* [fx] remove check.
* [autoparallel] polish docstring.
* [fx] refactor MetaTensor.
2 years ago
Super Daniel
0584654c79
[fx] refactor memory utils and extend shard utils. ( #1754 )
...
* [fx] change memory.py to memory_utils.py.
* [fx] add shard utils.
* [fx] fix import.
* [fx] check code style.
* [fx] add comment.
* [autoparallel] first move.
* [fx] add time computations.
2 years ago
Super Daniel
b893342f95
[fx] test tracer on diffuser modules. ( #1750 )
...
* [fx] test tracer on diffuser modules.
* [fx] shorter seq_len.
* Update requirements-test.txt
2 years ago
Super Daniel
30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler ( #1730 )
...
* [fx/profiler] add test.
* [fx] fix file names.
* [fx] add docstring and comment.
* [fx] polish profiler.py.
* [fx] fix import errors.
* [fx] fix profiler.
* [fx] fix names.
2 years ago
Super Daniel
393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug ( #1710 )
...
* [fx] move meta registration
* [fx] fix tests.
* [fx] fix test.
* [fx] fix.
* [meta] refactor meta registration.py.
* [fx] add compatibility descriptions.
* [fx] polish import.
* [fx] add a decorator.
* [fx] fix tests.
* [fx] remove print.
* [fx] edit raise error.
* [fx] edit raise error.
* [fx] add type hint.
* [fx] fix import in experimental.
* [rpc] remove color debug.
* [meta] fix naming.
2 years ago
Boyuan Yao
31d2f03d27
[autoparallel] fix C version rotor inconsistency ( #1691 )
2 years ago
Super Daniel
3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 ( #1679 )
...
* [fx/profiler] modify data_ptr into uuid for all tensors.
* [fx] modify uuid.
* [fx/profiler] tune performance on GPT-2.
* [fx] updates.
* [fx] debug.
* [fx] debug.
* [fx] cuda.
2 years ago
Boyuan Yao
1df98d5b66
[autoparallel] add rotor C version ( #1658 )
...
* [autoparallel] add rotor c version
* [fx] remove metainfoprop in rotor solver
* [autoparallel] modify C
code format
* [autoparallel] remove build.py
* [autoparallel] fix C extension build
* [autoparallel] add C solver consistency test
* [autoparallel] remove some unused imports
* [autoparallel] refactor rotor solver code
* [autoparallel] replace print with colossalai logger
* [autoparallel] ranks fixed
2 years ago
Frank Lee
30e50c8b4a
[autoparallel] implemented all matmul strategy generator ( #1650 )
2 years ago
Boyuan Yao
5d0fdb9cb4
[fx] fix offload codegen test ( #1648 )
...
* [fx] fix offload codegen test
* [fx] modify typing
2 years ago
Boyuan Yao
d6b01feb66
[fx] Modify offload codegen ( #1618 )
...
* [fx] modify offload codegen
* [fx] remove repeated hook definitions
* [fx] modify offload test
2 years ago
Super Daniel
d967779a32
[fx/profiler] tuned the calculation of memory estimation ( #1619 )
...
* [fx] tuned the meta info and rotor solver.
* [fx] remove import.
* [fx] remove import.
* [fx] remove import.
* [fx] tune the meta calculations.
* [fx] polish comments.
* [fx] remove assertions.
* [fx] modify test cases.
* [fx] modify test cases.
* [fx] optimize import.
* [fx
2 years ago
Boyuan Yao
933b6c6367
[fx] Add pofo solver ( #1608 )
...
* [fx] add pofo algorithm
* [fx] Add pofo solver
* [fx] code refactor
* [fx] fix test_linearize import
2 years ago
Boyuan Yao
a7cda6f57d
[fx] Add offload codegen ( #1598 )
...
* [fx] add input activation offload to codegen
* [fx] modify unit test
* [fx] remove two skips in torch11
* [fx] use all_input_nodes instead of _input_nodes
2 years ago
Super Daniel
c8e9b2ad78
[hotfix/rotor] fix variable names ( #1597 )
...
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
* [fx] fix variable names in ckpt_solvers and unskip tests.
* [fx] commit my changes.
* [fx] restore skips.
* [fx] restore skips.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
2 years ago
Boyuan Yao
f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen ( #1585 )
...
* [fx] add nested activation_checkpoint codegen
* undo algorithms commits
* solver
* undo some commits
* [fx] torch11 add nested activation checkpoint codegen
* remove some imports
* [fx] add some comments in activation codegen
* [fx] codegen instance error fix
2 years ago