Hongxin Liu
ac178ca5c1
[legacy] move builder and registry to legacy ( #4603 )
1 year ago
Hongxin Liu
8accecd55b
[legacy] move engine to legacy ( #4560 )
...
* [legacy] move engine to legacy
* [example] fix seq parallel example
* [example] fix seq parallel example
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [example] update seq parallel requirements
1 year ago
Hongxin Liu
89fe027787
[legacy] move trainer to legacy ( #4545 )
...
* [legacy] move trainer to legacy
* [doc] update docs related to trainer
* [test] ignore legacy test
1 year ago
Hongxin Liu
27061426f7
[gemini] improve compatibility and add static placement policy ( #4479 )
...
* [gemini] remove distributed-related part from colotensor (#4379 )
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398 )
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443 )
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457 )
* [example] update bert example result (#4458 )
* [doc] update gemini doc (#4468 )
* [example] update gemini related examples (#4473 )
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480 )
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497 )
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
1 year ago
flybird1111
f40b718959
[doc] Fix gradient accumulation doc. ( #4349 )
...
* [doc] fix gradient accumulation doc
* [doc] fix gradient accumulation doc
1 year ago
Baizhou Zhang
c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin ( #4302 )
...
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
1 year ago
Jianghai
711e2b4c00
[doc] update and revise some typos and errs in docs ( #4107 )
...
* fix some typos and problems in doc
* fix some typos and problems in doc
* add doc test
1 year ago
digger yu
769cddcb2c
fix typo docs/ ( #4033 )
1 year ago
Baizhou Zhang
4da324cd60
[hotfix]fix argument naming in docs and examples ( #4083 )
1 year ago
Frank Lee
ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
1 year ago
FoolPlayer
24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
...
[sync] sync feature/shardformer with develop
1 year ago
digger yu
33eef714db
fix typo examples and docs ( #3932 )
1 year ago
Hongxin Liu
12c90db3f3
[doc] add lazy init tutorial ( #3922 )
...
* [doc] add lazy init en doc
* [doc] add lazy init zh doc
* [doc] add lazy init doc in sidebar
* [doc] add lazy init doc test
* [doc] fix lazy init doc link
1 year ago
Baizhou Zhang
c1535ccbba
[doc] fix docs about booster api usage ( #3898 )
1 year ago
jiangmingyan
07cb21142f
[doc]update moe chinese document. ( #3890 )
...
* [doc]update-moe
* [doc]update-moe
* [doc]update-moe
* [doc]update-moe
* [doc]update-moe
1 year ago
jiangmingyan
281b33f362
[doc] update document of zero with chunk. ( #3855 )
...
* [doc] fix title of mixed precision
* [doc]update document of zero with chunk
* [doc] update document of zero with chunk, fix
* [doc] update document of zero with chunk, fix
* [doc] update document of zero with chunk, fix
* [doc] update document of zero with chunk, add doc test
* [doc] update document of zero with chunk, add doc test
* [doc] update document of zero with chunk, fix installation
* [doc] update document of zero with chunk, fix zero with chunk doc
* [doc] update document of zero with chunk, fix zero with chunk doc
2 years ago
jiangmingyan
b0474878bf
[doc] update nvme offload documents. ( #3850 )
2 years ago
jiangmingyan
a64df3fa97
[doc] update document of gemini instruction. ( #3842 )
...
* [doc] update meet_gemini.md
* [doc] update meet_gemini.md
* [doc] fix parentheses
* [doc] fix parentheses
* [doc] fix doc test
* [doc] fix doc test
* [doc] fix doc
2 years ago
Frank Lee
54e97ed7ea
[workflow] supported test on CUDA 10.2 ( #3841 )
2 years ago
wukong1992
3229f93e30
[booster] add warning for torch fsdp plugin doc ( #3833 )
2 years ago
digger yu
518b31c059
[docs] change placememt_policy to placement_policy ( #3829 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
2 years ago
digger yu
e90fdb1000
fix typo docs/
2 years ago
jiangmingyan
725365f297
Merge pull request #3810 from jiangmingyan/amp
...
[doc] update amp document
2 years ago
jiangmingyan
278fcbc444
[doc]fix
2 years ago
jiangmingyan
8aa1fb2c7f
[doc]fix
2 years ago
Hongxin Liu
19d153057e
[doc] add warning about fsdp plugin ( #3813 )
2 years ago
jiangmingyan
c425a69d52
[doc] add removed change of config.py
2 years ago
jiangmingyan
75272ef37b
[doc] add removed warning
2 years ago
Mingyan Jiang
a520610bd9
[doc] update amp document
2 years ago
Mingyan Jiang
8c62e50dbb
[doc] update amp document
2 years ago
jiangmingyan
ef02d7ef6d
[doc] update gradient accumulation ( #3771 )
...
* [doc]update gradient accumulation
* [doc]update gradient accumulation
* [doc]update gradient accumulation
* [doc]update gradient accumulation
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, add sidebars
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, resolve comments
* [doc]update gradient accumulation, resolve comments
* fix
2 years ago
jiangmingyan
fe1561a884
[doc] update gradient cliping document ( #3778 )
...
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping doc, fix sidebars.json
* [doc] update gradient clipping doc, fix doc test
2 years ago
Yanjia0
d9393b85f1
[doc] add deprecated warning on doc Basics section ( #3754 )
...
* Update colotensor_concept.md
* Update configure_parallelization.md
* Update define_your_config.md
* Update engine_trainer.md
* Update initialize_features.md
* Update model_checkpoint.md
* Update colotensor_concept.md
* Update configure_parallelization.md
* Update define_your_config.md
* Update engine_trainer.md
* Update initialize_features.md
* Update model_checkpoint.md
2 years ago
Hongxin Liu
72688adb2f
[doc] add booster docstring and fix autodoc ( #3789 )
...
* [doc] add docstr for booster methods
* [doc] fix autodoc
2 years ago
Hongxin Liu
60e6a154bc
[doc] add tutorial for booster checkpoint ( #3785 )
...
* [doc] add checkpoint related docstr for booster
* [doc] add en checkpoint doc
* [doc] add zh checkpoint doc
* [doc] add booster checkpoint doc in sidebar
* [doc] add cuation about ckpt for plugins
* [doc] add doctest placeholder
* [doc] add doctest placeholder
* [doc] add doctest placeholder
2 years ago
Hongxin Liu
21e29e2212
[doc] add tutorial for booster plugins ( #3758 )
...
* [doc] add en booster plugins doc
* [doc] add booster plugins doc in sidebar
* [doc] add zh booster plugins doc
* [doc] fix zh booster plugin translation
* [doc] reoganize tutorials order of basic section
* [devops] force sync to test ci
2 years ago
Hongxin Liu
5ce6c9d86f
[doc] add tutorial for cluster utils ( #3763 )
...
* [doc] add en cluster utils doc
* [doc] add zh cluster utils doc
* [doc] add cluster utils doc in sidebar
2 years ago
jiangmingyan
48bd056761
[doc] update hybrid parallelism doc ( #3770 )
2 years ago
jiangmingyan
d449525acf
[doc] update booster tutorials ( #3718 )
...
* [booster] update booster tutorials#3717
* [booster] update booster tutorials#3717, fix
* [booster] update booster tutorials#3717, update setup doc
* [booster] update booster tutorials#3717, update setup doc
* [booster] update booster tutorials#3717, update setup doc
* [booster] update booster tutorials#3717, update setup doc
* [booster] update booster tutorials#3717, update setup doc
* [booster] update booster tutorials#3717, update setup doc
* [booster] update booster tutorials#3717, rename colossalai booster.md
* [booster] update booster tutorials#3717, rename colossalai booster.md
* [booster] update booster tutorials#3717, rename colossalai booster.md
* [booster] update booster tutorials#3717, fix
* [booster] update booster tutorials#3717, fix
* [booster] update tutorials#3717, update booster api doc
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, modify file
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3717, fix reference link
* [booster] update tutorials#3713
* [booster] update tutorials#3713, modify file
2 years ago
Hongxin Liu
5dd573c6b6
[devops] fix ci for document check ( #3751 )
...
* [doc] add test info
* [devops] update doc check ci
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] remove debug info and update invalid doc
* [devops] add essential comments
2 years ago
digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc( #3618 )
...
* Fixed several spelling errors under colossalai
* Fix the spelling error in colossalai and docs directory
* Cautious Changed the spelling error under the example folder
* Update runtime_preparation_pass.py
revert autograft to autograd
* Update search_chunk.py
utile to until
* Update check_installation.py
change misteach to mismatch in line 91
* Update 1D_tensor_parallel.md
revert to perceptron
* Update 2D_tensor_parallel.md
revert to perceptron in line 73
* Update 2p5D_tensor_parallel.md
revert to perceptron in line 71
* Update 3D_tensor_parallel.md
revert to perceptron in line 80
* Update README.md
revert to resnet in line 42
* Update reorder_graph.py
revert to indice in line 7
* Update p2p.py
revert to megatron in line 94
* Update initialize.py
revert to torchrun in line 198
* Update routers.py
change to detailed in line 63
* Update routers.py
change to detailed in line 146
* Update README.md
revert random number in line 402
2 years ago
digger-yu
9edeadfb24
[doc] Update 1D_tensor_parallel.md ( #3573 )
...
Display format optimization , same as fix#3562
Simultaneous modification of en version
2 years ago
digger-yu
1c7734bc94
[doc] Update 1D_tensor_parallel.md ( #3563 )
...
Display format optimization, fix bug#3562
Specific changes
1. "This is called a column-parallel fashion" Translate to Chinese
2. use the ```math code block syntax to display a math expression as a block, No modification of formula content
Please check that the math formula is displayed correctly
If OK, I will change the format of the English version of the formula in parallel
2 years ago
binmakeswell
0c0455700f
[doc] add requirement and highlight application ( #3516 )
...
* [doc] add requirement and highlight application
* [doc] link example and application
2 years ago
Frank Lee
80eba05b0a
[test] refactor tests with spawn ( #3452 )
...
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2 years ago
Frank Lee
416a50dbd7
[doc] moved doc test command to bottom ( #3075 )
2 years ago
Frank Lee
ea0b52c12e
[doc] specified operating system requirement ( #3019 )
...
* [doc] specified operating system requirement
* polish code
2 years ago
ver217
378d827c6b
[doc] update nvme offload doc ( #3014 )
...
* [doc] update nvme offload doc
* [doc] add doc testing cmd and requirements
* [doc] add api reference
* [doc] add dependencies
2 years ago
Frank Lee
e0a1c1321c
[doc] added reference to related works ( #2994 )
...
* [doc] added reference to related works
* polish code
2 years ago