Hongxin Liu
|
ccabcf6485
|
[fp8] support fp8 amp for hybrid parallel plugin (#5975)
* [fp8] support fp8 amp for hybrid parallel plugin
* [test] add fp8 hook test
* [fp8] fix fp8 linear compatibility
|
4 months ago |
Hongxin Liu
|
2dd01e3a14
|
[gemini] fix param op hook when output is tuple (#5355)
* [gemini] fix param op hook when output is tuple
* [gemini] fix param op hook
|
10 months ago |
flybird11111
|
3dbbf83f1c
|
fix (#5158)
fix
|
1 year ago |
Hongxin Liu
|
079bf3cb26
|
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
|
1 year ago |
Hongxin Liu
|
27061426f7
|
[gemini] improve compatibility and add static placement policy (#4479)
* [gemini] remove distributed-related part from colotensor (#4379)
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398)
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443)
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457)
* [example] update bert example result (#4458)
* [doc] update gemini doc (#4468)
* [example] update gemini related examples (#4473)
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480)
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497)
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
|
1 year ago |
HELSON
|
552183bb74
|
[polish] polish ColoTensor and its submodules (#2537)
|
2 years ago |
HELSON
|
2458659919
|
[zero] fix error for BEiT models (#2169)
* [zero] fix error for BEiT models
* [ColoParameter] add unpack operation for tuple arguments
* fix bugs
* fix chunkv2 unit testing
* add assertion for gradient state
|
2 years ago |
Jiarui Fang
|
b3b89865e2
|
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
|
2 years ago |
YuliangLiu0306
|
49216d7ab1
|
[autoparallel] fix bugs caused by negative dim key (#1808)
* [autoparallel] fix bugs caused by negative dim key
* fix import error
* fix matmul test issue
* fix unit test issue
|
2 years ago |
Jiarui Fang
|
85f933b58b
|
[Optimizer] Remove useless ColoOptimizer (#1312)
|
2 years ago |
Jiarui Fang
|
4a76084dc9
|
[tensor] add zero_like colo op, important for Optimizer (#1236)
|
2 years ago |
Jiarui Fang
|
ae7d3f4927
|
[refactor] move process group from _DistSpec to ColoTensor. (#1203)
|
2 years ago |
Jiarui Fang
|
1b657f9ce1
|
[tensor] revert local view back (#1178)
|
2 years ago |
Jiarui Fang
|
aa7bef73d4
|
[Tensor] distributed view supports inter-process hybrid parallel (#1169)
|
2 years ago |
Jiarui Fang
|
4b9bba8116
|
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168)
|
2 years ago |
Jiarui Fang
|
f4ef224358
|
[Tensor] remove ParallelAction, use ComputeSpec instread (#1166)
|
2 years ago |
Jiarui Fang
|
8cdce0399c
|
[ColoTensor] improves init functions. (#1150)
|
2 years ago |
ver217
|
789cad301b
|
[hotfix] fix param op hook (#1131)
* fix param op hook
* update zero tp test
* fix bugs
|
2 years ago |
ver217
|
f99f56dff4
|
fix colo parameter torch function (#1117)
|
2 years ago |
ver217
|
895c1c5ee7
|
[tensor] refactor param op hook (#1097)
* refactor param op hook
* add docstr
* fix bug
|
3 years ago |
Jiarui Fang
|
a00644079e
|
reorgnize colotensor directory (#1062)
* reorgnize colotensor directory
* polish code
|
3 years ago |
ver217
|
9492a561c3
|
[tensor] ColoTensor supports ZeRo (#1015)
* impl chunk manager
* impl param op hook
* add reduce_chunk
* add zero hook v2
* add zero dp
* fix TensorInfo
* impl load balancing when using zero without chunk
* fix zero hook
* polish chunk
* fix bugs
* ddp ok
* zero ok
* polish code
* fix bugs about load balancing
* polish code
* polish code
* add ene-to-end test
* polish code
* polish code
* polish code
* fix typo
* add test_chunk
* fix bugs
* fix bugs
* polish code
|
3 years ago |
Ziyue Jiang
|
7c530b9de2
|
[Tensor] add Parameter inheritance for ColoParameter (#1041)
* add Parameter inheritance for ColoParameter
* remove tricks
* remove tricks
* polish
* polish
|
3 years ago |
Ziyue Jiang
|
6c5996a56e
|
[Tensor] add module check and bert test (#1031)
* add Embedding
* Add bert test
* polish
* add check module test
* polish
* polish
* polish
* polish
|
3 years ago |
ver217
|
a3b66f6def
|
[tensor] refactor parallel action (#1007)
* refactor parallel action
* polish unit tests
|
3 years ago |
ver217
|
ad536e308e
|
[tensor] refactor colo-tensor (#992)
* refactor colo-tensor and update linear op
* polish code
* polish code
* update ops and unit tests
* update unit tests
* polish code
* rename dist_spec module
* polish code
* polish code
* remove unneeded import
* fix pipelinable
|
3 years ago |
Jiarui Fang
|
ab95ec9aea
|
[Tensor] init ColoParameter (#914)
|
3 years ago |