Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Hongxin Liu
554aa9592e
[legacy] move communication and nn to legacy and refactor logger ( #4671 )
...
* [legacy] move communication to legacy (#4640 )
* [legacy] refactor logger and clean up legacy codes (#4654 )
* [legacy] make logger independent to gpc
* [legacy] make optim independent to registry
* [legacy] move test engine to legacy
* [legacy] move nn to legacy (#4656 )
* [legacy] move nn to legacy
* [checkpointio] fix save hf config
* [test] remove useledd rpc pp test
* [legacy] fix nn init
* [example] skip tutorial hybriad parallel example
* [devops] test doc check
* [devops] test doc check
2023-09-11 16:24:28 +08:00
Hongxin Liu
8accecd55b
[legacy] move engine to legacy ( #4560 )
...
* [legacy] move engine to legacy
* [example] fix seq parallel example
* [example] fix seq parallel example
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [example] update seq parallel requirements
2023-09-05 21:53:10 +08:00
Tian Siyuan
f1ae8c9104
[example] change accelerate version ( #4431 )
...
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
2023-08-30 22:56:13 +08:00
Hongxin Liu
27061426f7
[gemini] improve compatibility and add static placement policy ( #4479 )
...
* [gemini] remove distributed-related part from colotensor (#4379 )
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398 )
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443 )
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457 )
* [example] update bert example result (#4458 )
* [doc] update gemini doc (#4468 )
* [example] update gemini related examples (#4473 )
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480 )
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497 )
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
2023-08-24 09:29:25 +08:00
Tian Siyuan
ff836790ae
[doc] fix a typo in examples/tutorial/auto_parallel/README.md ( #4430 )
...
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
2023-08-15 00:22:57 +08:00
binmakeswell
089c365fa0
[doc] add Series A Funding and NeurIPS news ( #4377 )
...
* [doc] add Series A Funding and NeurIPS news
* [kernal] fix mha kernal
* [CI] skip moe
* [CI] fix requirements
2023-08-04 17:42:07 +08:00
github-actions[bot]
4e9b09c222
Automated submodule synchronization ( #4217 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-07-12 17:35:58 +08:00
github-actions[bot]
62c7e67f9f
[format] applied code formatting on changed files in pull request 3786 ( #3787 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-05-22 14:42:09 +08:00
binmakeswell
ad2cf58f50
[chat] add performance and tutorial ( #3786 )
2023-05-19 18:03:56 +08:00
digger-yu
b7141c36dd
[CI] fix some spelling errors ( #3707 )
...
* fix spelling error with examples/comminity/
* fix spelling error with tests/
* fix some spelling error with tests/ colossalai/ etc.
2023-05-10 17:12:03 +08:00
Hongxin Liu
3bf09efe74
[booster] update prepare dataloader method for plugin ( #3706 )
...
* [booster] add prepare dataloader method for plug
* [booster] update examples and docstr
2023-05-08 15:44:03 +08:00
Hongxin Liu
f83ea813f5
[example] add train resnet/vit with booster example ( #3694 )
...
* [example] add train vit with booster example
* [example] update readme
* [example] add train resnet with booster example
* [example] enable ci
* [example] enable ci
* [example] add requirements
* [hotfix] fix analyzer init
* [example] update requirements
2023-05-08 10:42:30 +08:00
Hongxin Liu
d556648885
[example] add finetune bert with booster example ( #3693 )
2023-05-06 11:53:13 +08:00
github-actions[bot]
d544ed4345
[bot] Automated submodule synchronization ( #3596 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-04-19 10:38:12 +08:00
binmakeswell
f1b3d60cae
[example] reorganize for community examples ( #3557 )
2023-04-14 16:27:48 +08:00
Frank Lee
80eba05b0a
[test] refactor tests with spawn ( #3452 )
...
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-04-06 14:51:35 +08:00
Frank Lee
7d8d825681
[booster] fixed the torch ddp plugin with the new checkpoint api ( #3442 )
2023-04-06 09:43:51 +08:00
ver217
573af84184
[example] update examples related to zero/gemini ( #3431 )
...
* [zero] update legacy import
* [zero] update examples
* [example] fix opt tutorial
* [example] fix opt tutorial
* [example] fix opt tutorial
* [example] fix opt tutorial
* [example] fix import
2023-04-04 17:32:51 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2023-04-04 13:48:16 +08:00
YuliangLiu0306
fd6add575d
[examples] polish AutoParallel readme ( #3270 )
2023-03-28 10:40:07 +08:00
Frank Lee
73d3e4d309
[booster] implemented the torch ddd + resnet example ( #3232 )
...
* [booster] implemented the torch ddd + resnet example
* polish code
2023-03-27 10:24:14 +08:00
github-actions[bot]
0aa92c0409
Automated submodule synchronization ( #3105 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-03-13 08:58:06 +08:00
binmakeswell
018936a3f3
[tutorial] update notes for TransformerEngine ( #3098 )
2023-03-10 16:30:52 +08:00
Kirthi Shankar Sivamani
65a4dbda6c
[NVIDIA] Add FP8 example using TE ( #3080 )
...
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
2023-03-10 16:24:08 +08:00
binmakeswell
52a5078988
[doc] add ISC tutorial ( #2997 )
...
* [doc] add ISC tutorial
* [doc] add ISC tutorial
* [doc] add ISC tutorial
* [doc] add ISC tutorial
2023-03-06 10:36:38 +08:00
github-actions[bot]
827a0af8cc
Automated submodule synchronization ( #2982 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-03-03 10:55:45 +08:00
binmakeswell
0afb55fc5b
[doc] add os scope, update tutorial install and tips ( #2914 )
2023-02-27 14:59:27 +08:00
Zheng Zeng
597914317b
[doc] fix typo in opt inference tutorial ( #2849 )
2023-02-21 17:16:13 +08:00
github-actions[bot]
a5721229d9
Automated submodule synchronization ( #2740 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-02-20 17:35:46 +08:00
github-actions[bot]
d701ef81b1
Automated submodule synchronization ( #2707 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-02-15 09:39:44 +08:00
github-actions[bot]
88416019e7
Automated submodule synchronization ( #2648 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-02-13 18:10:54 +08:00
binmakeswell
9ab14b20b5
[doc] add CVPR tutorial ( #2666 )
2023-02-10 20:43:34 +08:00
Frank Lee
4ae02c4b1c
[tutorial] added energonai to opt inference requirements ( #2625 )
2023-02-07 16:58:06 +08:00
binmakeswell
0556f5d468
[tutorial] add video link ( #2619 )
2023-02-07 15:14:51 +08:00
github-actions[bot]
ae86be1fd2
Automated submodule synchronization ( #2607 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-02-07 09:33:27 +08:00
binmakeswell
039b0c487b
[tutorial] polish README ( #2568 )
2023-02-04 17:49:52 +08:00
oahzxl
4f5ef73a43
[tutorial] update fastfold tutorial ( #2565 )
...
* update readme
* update
* update
2023-02-03 16:54:28 +08:00
YuliangLiu0306
f477a14f4a
[hotfix] fix autoparallel demo ( #2533 )
2023-01-31 17:42:45 +08:00
LuGY
ecbad93b65
[example] Add fastfold tutorial ( #2528 )
...
* add fastfold example
* pre-commit polish
* pre-commit polish readme and add empty test ci
* Add test_ci and reduce the default sequence length
2023-01-30 17:08:18 +08:00
Frank Lee
8b7495dd54
[example] integrate seq-parallel tutorial with CI ( #2463 )
2023-01-13 14:40:05 +08:00
Frank Lee
e6943e2d11
[example] integrate autoparallel demo with CI ( #2466 )
...
* [example] integrate autoparallel demo with CI
* polish code
* polish code
* polish code
* polish code
2023-01-12 16:26:42 +08:00
YuliangLiu0306
c20529fe78
[examples] update autoparallel tutorial demo ( #2449 )
...
* [examples] update autoparallel tutorial demo
* add test_ci.sh
* polish
* add conda yaml
2023-01-12 14:30:58 +08:00
Frank Lee
ac18a445fa
[example] updated large-batch optimizer tutorial ( #2448 )
...
* [example] updated large-batch optimizer tutorial
* polish code
* polish code
2023-01-11 16:27:31 +08:00
Frank Lee
39163417a1
[example] updated the hybrid parallel tutorial ( #2444 )
...
* [example] updated the hybrid parallel tutorial
* polish code
2023-01-11 15:17:17 +08:00
Frank Lee
63be79d505
[example] removed duplicated stable diffusion example ( #2424 )
2023-01-11 10:07:18 +08:00
Frank Lee
8327932d2c
[workflow] refactored the example check workflow ( #2411 )
...
* [workflow] refactored the example check workflow
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-01-10 11:26:19 +08:00
binmakeswell
d7352bef2c
[example] add example requirement ( #2345 )
2023-01-06 09:03:29 +08:00
YuliangLiu0306
edf4cd46c5
[examples] update autoparallel demo ( #2061 )
2022-12-01 18:50:58 +08:00