Hongxin Liu
66f3926019
[doc] clean up outdated docs ( #4765 )
...
* [doc] clean up outdated docs
* [doc] fix linking
* [doc] fix linking
2023-09-21 11:36:20 +08:00
Pengtai Xu
4d7537ba25
[doc] put native colossalai plugins first in description section
2023-09-20 09:24:10 +08:00
Pengtai Xu
e10d9f087e
[doc] add model examples for each plugin
2023-09-19 18:01:23 +08:00
Pengtai Xu
a04337bfc3
[doc] put individual plugin explanation in front
2023-09-19 16:27:37 +08:00
Pengtai Xu
10513f203c
[doc] explain suitable use case for each plugin
2023-09-19 15:50:14 +08:00
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Baizhou Zhang
d151dcab74
[doc] explaination of loading large pretrained models ( #4741 )
2023-09-15 21:04:07 +08:00
Baizhou Zhang
451c3465fb
[doc] polish shardformer doc ( #4735 )
...
* arrange position of chapters
* fix typos in seq parallel doc
2023-09-15 17:39:10 +08:00
Bin Jia
6a03c933a0
[shardformer] update seq parallel document ( #4730 )
...
* update doc of seq parallel
* fix typo
2023-09-15 16:09:32 +08:00
flybird11111
46162632e5
[shardformer] update pipeline parallel document ( #4725 )
...
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
2023-09-15 14:32:04 +08:00
Baizhou Zhang
50e5602c2d
[doc] add shardformer support matrix/update tensor parallel documents ( #4728 )
...
* add compatibility matrix for shardformer doc
* update tp doc
2023-09-15 13:52:30 +08:00
Baizhou Zhang
f911d5b09d
[doc] Add user document for Shardformer ( #4702 )
...
* create shardformer doc files
* add docstring for seq-parallel
* update ShardConfig docstring
* add links to llama example
* add outdated massage
* finish introduction & supporting information
* finish 'how shardformer works'
* finish shardformer.md English doc
* fix doctest fail
* add Chinese document
2023-09-15 10:56:39 +08:00
Baizhou Zhang
1d454733c4
[doc] Update booster user documents. ( #4669 )
...
* update booster_api.md
* update booster_checkpoint.md
* update booster_plugins.md
* move transformers importing inside function
* fix Dict typing
* fix autodoc bug
* small fix
2023-09-12 10:47:23 +08:00
Hongxin Liu
554aa9592e
[legacy] move communication and nn to legacy and refactor logger ( #4671 )
...
* [legacy] move communication to legacy (#4640 )
* [legacy] refactor logger and clean up legacy codes (#4654 )
* [legacy] make logger independent to gpc
* [legacy] make optim independent to registry
* [legacy] move test engine to legacy
* [legacy] move nn to legacy (#4656 )
* [legacy] move nn to legacy
* [checkpointio] fix save hf config
* [test] remove useledd rpc pp test
* [legacy] fix nn init
* [example] skip tutorial hybriad parallel example
* [devops] test doc check
* [devops] test doc check
2023-09-11 16:24:28 +08:00
Hongxin Liu
ac178ca5c1
[legacy] move builder and registry to legacy ( #4603 )
2023-09-05 21:53:10 +08:00
Hongxin Liu
8accecd55b
[legacy] move engine to legacy ( #4560 )
...
* [legacy] move engine to legacy
* [example] fix seq parallel example
* [example] fix seq parallel example
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [example] update seq parallel requirements
2023-09-05 21:53:10 +08:00
Hongxin Liu
89fe027787
[legacy] move trainer to legacy ( #4545 )
...
* [legacy] move trainer to legacy
* [doc] update docs related to trainer
* [test] ignore legacy test
2023-09-05 21:53:10 +08:00
Hongxin Liu
27061426f7
[gemini] improve compatibility and add static placement policy ( #4479 )
...
* [gemini] remove distributed-related part from colotensor (#4379 )
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398 )
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443 )
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457 )
* [example] update bert example result (#4458 )
* [doc] update gemini doc (#4468 )
* [example] update gemini related examples (#4473 )
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480 )
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497 )
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
2023-08-24 09:29:25 +08:00
flybird1111
f40b718959
[doc] Fix gradient accumulation doc. ( #4349 )
...
* [doc] fix gradient accumulation doc
* [doc] fix gradient accumulation doc
2023-08-04 17:24:35 +08:00
Baizhou Zhang
c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin ( #4302 )
...
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
2023-07-21 14:39:01 +08:00
Jianghai
711e2b4c00
[doc] update and revise some typos and errs in docs ( #4107 )
...
* fix some typos and problems in doc
* fix some typos and problems in doc
* add doc test
2023-06-28 19:30:37 +08:00
digger yu
769cddcb2c
fix typo docs/ ( #4033 )
2023-06-28 15:30:30 +08:00
Baizhou Zhang
4da324cd60
[hotfix]fix argument naming in docs and examples ( #4083 )
2023-06-26 23:50:04 +08:00
Frank Lee
ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:41:27 +08:00
FoolPlayer
24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
...
[sync] sync feature/shardformer with develop
2023-06-09 09:34:00 +08:00
digger yu
33eef714db
fix typo examples and docs ( #3932 )
2023-06-08 16:09:32 +08:00
Hongxin Liu
12c90db3f3
[doc] add lazy init tutorial ( #3922 )
...
* [doc] add lazy init en doc
* [doc] add lazy init zh doc
* [doc] add lazy init doc in sidebar
* [doc] add lazy init doc test
* [doc] fix lazy init doc link
2023-06-07 17:59:58 +08:00
Baizhou Zhang
c1535ccbba
[doc] fix docs about booster api usage ( #3898 )
2023-06-06 13:36:11 +08:00
jiangmingyan
07cb21142f
[doc]update moe chinese document. ( #3890 )
...
* [doc]update-moe
* [doc]update-moe
* [doc]update-moe
* [doc]update-moe
* [doc]update-moe
2023-06-05 15:57:54 +08:00
jiangmingyan
281b33f362
[doc] update document of zero with chunk. ( #3855 )
...
* [doc] fix title of mixed precision
* [doc]update document of zero with chunk
* [doc] update document of zero with chunk, fix
* [doc] update document of zero with chunk, fix
* [doc] update document of zero with chunk, fix
* [doc] update document of zero with chunk, add doc test
* [doc] update document of zero with chunk, add doc test
* [doc] update document of zero with chunk, fix installation
* [doc] update document of zero with chunk, fix zero with chunk doc
* [doc] update document of zero with chunk, fix zero with chunk doc
2023-05-30 18:41:56 +08:00
jiangmingyan
b0474878bf
[doc] update nvme offload documents. ( #3850 )
2023-05-26 01:22:01 +08:00
jiangmingyan
a64df3fa97
[doc] update document of gemini instruction. ( #3842 )
...
* [doc] update meet_gemini.md
* [doc] update meet_gemini.md
* [doc] fix parentheses
* [doc] fix parentheses
* [doc] fix doc test
* [doc] fix doc test
* [doc] fix doc
2023-05-25 14:58:01 +08:00
Frank Lee
54e97ed7ea
[workflow] supported test on CUDA 10.2 ( #3841 )
2023-05-25 14:14:34 +08:00
wukong1992
3229f93e30
[booster] add warning for torch fsdp plugin doc ( #3833 )
2023-05-25 14:00:02 +08:00
digger yu
518b31c059
[docs] change placememt_policy to placement_policy ( #3829 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
2023-05-24 14:51:49 +08:00
digger yu
e90fdb1000
fix typo docs/
2023-05-24 13:57:43 +08:00
jiangmingyan
725365f297
Merge pull request #3810 from jiangmingyan/amp
...
[doc] update amp document
2023-05-23 18:58:16 +08:00
jiangmingyan
278fcbc444
[doc]fix
2023-05-23 17:53:11 +08:00
jiangmingyan
8aa1fb2c7f
[doc]fix
2023-05-23 17:50:30 +08:00
Hongxin Liu
19d153057e
[doc] add warning about fsdp plugin ( #3813 )
2023-05-23 17:16:10 +08:00
jiangmingyan
c425a69d52
[doc] add removed change of config.py
2023-05-23 16:42:36 +08:00
jiangmingyan
75272ef37b
[doc] add removed warning
2023-05-23 16:34:30 +08:00
Mingyan Jiang
a520610bd9
[doc] update amp document
2023-05-23 16:20:29 +08:00
Mingyan Jiang
8c62e50dbb
[doc] update amp document
2023-05-23 16:20:01 +08:00
jiangmingyan
ef02d7ef6d
[doc] update gradient accumulation ( #3771 )
...
* [doc]update gradient accumulation
* [doc]update gradient accumulation
* [doc]update gradient accumulation
* [doc]update gradient accumulation
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, add sidebars
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, fix
* [doc]update gradient accumulation, resolve comments
* [doc]update gradient accumulation, resolve comments
* fix
2023-05-23 10:52:30 +08:00
jiangmingyan
fe1561a884
[doc] update gradient cliping document ( #3778 )
...
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping document
* [doc] update gradient clipping doc, fix sidebars.json
* [doc] update gradient clipping doc, fix doc test
2023-05-22 14:13:15 +08:00
Yanjia0
d9393b85f1
[doc] add deprecated warning on doc Basics section ( #3754 )
...
* Update colotensor_concept.md
* Update configure_parallelization.md
* Update define_your_config.md
* Update engine_trainer.md
* Update initialize_features.md
* Update model_checkpoint.md
* Update colotensor_concept.md
* Update configure_parallelization.md
* Update define_your_config.md
* Update engine_trainer.md
* Update initialize_features.md
* Update model_checkpoint.md
2023-05-22 11:12:53 +08:00
Hongxin Liu
72688adb2f
[doc] add booster docstring and fix autodoc ( #3789 )
...
* [doc] add docstr for booster methods
* [doc] fix autodoc
2023-05-22 10:56:47 +08:00
Hongxin Liu
60e6a154bc
[doc] add tutorial for booster checkpoint ( #3785 )
...
* [doc] add checkpoint related docstr for booster
* [doc] add en checkpoint doc
* [doc] add zh checkpoint doc
* [doc] add booster checkpoint doc in sidebar
* [doc] add cuation about ckpt for plugins
* [doc] add doctest placeholder
* [doc] add doctest placeholder
* [doc] add doctest placeholder
2023-05-19 18:05:08 +08:00
Hongxin Liu
21e29e2212
[doc] add tutorial for booster plugins ( #3758 )
...
* [doc] add en booster plugins doc
* [doc] add booster plugins doc in sidebar
* [doc] add zh booster plugins doc
* [doc] fix zh booster plugin translation
* [doc] reoganize tutorials order of basic section
* [devops] force sync to test ci
2023-05-19 12:12:42 +08:00