Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
github-actions[bot]
3c6b831c26
[format] applied code formatting on changed files in pull request 4743 ( #4750 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-09-18 16:52:42 +08:00
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Bin Jia
608cffaed3
[example] add gpt2 HybridParallelPlugin example ( #4653 )
...
* add gpt2 HybridParallelPlugin example
* update readme and testci
* update test ci
* fix test_ci bug
* update requirements
* add requirements
* update requirements
* add requirement
* rename file
2023-09-15 17:12:46 +08:00
Hongxin Liu
554aa9592e
[legacy] move communication and nn to legacy and refactor logger ( #4671 )
...
* [legacy] move communication to legacy (#4640 )
* [legacy] refactor logger and clean up legacy codes (#4654 )
* [legacy] make logger independent to gpc
* [legacy] make optim independent to registry
* [legacy] move test engine to legacy
* [legacy] move nn to legacy (#4656 )
* [legacy] move nn to legacy
* [checkpointio] fix save hf config
* [test] remove useledd rpc pp test
* [legacy] fix nn init
* [example] skip tutorial hybriad parallel example
* [devops] test doc check
* [devops] test doc check
2023-09-11 16:24:28 +08:00
Hongxin Liu
ac178ca5c1
[legacy] move builder and registry to legacy ( #4603 )
2023-09-05 21:53:10 +08:00
Hongxin Liu
89fe027787
[legacy] move trainer to legacy ( #4545 )
...
* [legacy] move trainer to legacy
* [doc] update docs related to trainer
* [test] ignore legacy test
2023-09-05 21:53:10 +08:00
Hongxin Liu
27061426f7
[gemini] improve compatibility and add static placement policy ( #4479 )
...
* [gemini] remove distributed-related part from colotensor (#4379 )
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398 )
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443 )
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457 )
* [example] update bert example result (#4458 )
* [doc] update gemini doc (#4468 )
* [example] update gemini related examples (#4473 )
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480 )
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497 )
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
2023-08-24 09:29:25 +08:00
digger yu
2d40759a53
fix #3852 path error ( #4058 )
2023-06-28 15:29:44 +08:00
Baizhou Zhang
4da324cd60
[hotfix]fix argument naming in docs and examples ( #4083 )
2023-06-26 23:50:04 +08:00
LuGY
160c64c645
[example] fix bucket size in example of gpt gemini ( #4028 )
2023-06-19 11:22:42 +08:00
digger yu
33eef714db
fix typo examples and docs ( #3932 )
2023-06-08 16:09:32 +08:00
jiangmingyan
5f79008c4a
[example] update gemini examples ( #3868 )
...
* [example]update gemini examples
* [example]update gemini examples
2023-05-30 18:41:41 +08:00
binmakeswell
15024e40d9
[auto] fix install cmd ( #3772 )
2023-05-18 13:33:01 +08:00
digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc( #3618 )
...
* Fixed several spelling errors under colossalai
* Fix the spelling error in colossalai and docs directory
* Cautious Changed the spelling error under the example folder
* Update runtime_preparation_pass.py
revert autograft to autograd
* Update search_chunk.py
utile to until
* Update check_installation.py
change misteach to mismatch in line 91
* Update 1D_tensor_parallel.md
revert to perceptron
* Update 2D_tensor_parallel.md
revert to perceptron in line 73
* Update 2p5D_tensor_parallel.md
revert to perceptron in line 71
* Update 3D_tensor_parallel.md
revert to perceptron in line 80
* Update README.md
revert to resnet in line 42
* Update reorder_graph.py
revert to indice in line 7
* Update p2p.py
revert to megatron in line 94
* Update initialize.py
revert to torchrun in line 198
* Update routers.py
change to detailed in line 63
* Update routers.py
change to detailed in line 146
* Update README.md
revert random number in line 402
2023-04-26 11:38:43 +08:00
Frank Lee
80eba05b0a
[test] refactor tests with spawn ( #3452 )
...
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-04-06 14:51:35 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2023-04-04 13:48:16 +08:00
Yan Fang
189347963a
[auto] fix requirements typo for issue #3125 ( #3209 )
2023-03-23 10:22:08 +08:00
Zihao
18dbe76cae
[auto-parallel] add auto-offload feature ( #3154 )
...
* add auto-offload feature
* polish code
* fix syn offload runtime pass bug
* add offload example
* fix offload testing bug
* fix example testing bug
2023-03-21 14:17:41 +08:00
Ziyue Jiang
400f63012e
[pipeline] Add Simplified Alpa DP Partition ( #2507 )
...
* add alpa dp split
* add alpa dp split
* use fwd+bwd instead of fwd only
---------
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-03-07 10:34:31 +08:00
github-actions[bot]
da056285f2
[format] applied code formatting on changed files in pull request 2922 ( #2923 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-02-27 19:29:06 +08:00
binmakeswell
12bafe057f
[doc] update installation for GPT ( #2922 )
2023-02-27 18:28:34 +08:00
dawei-wang
55424a16a5
[doc] fix GPT tutorial ( #2860 )
...
Fix hpcaitech/ColossalAI#2851
2023-02-22 10:58:52 +08:00
cloudhuang
43dffdaba5
[doc] fixed a typo in GPT readme ( #2736 )
2023-02-15 22:24:45 +08:00
Jiatong (Julius) Han
a255a38f7f
[example] Polish README.md ( #2658 )
...
* [tutorial] polish readme.md
* [example] Update README.md
2023-02-09 20:43:55 +08:00
HELSON
6e0faa70e0
[gemini] add profiler in the demo ( #2534 )
2023-01-31 14:21:22 +08:00
HELSON
66dfcf5281
[gemini] update the gpt example ( #2527 )
2023-01-30 17:58:05 +08:00
HELSON
707b11d4a0
[gemini] update ddp strict mode ( #2518 )
...
* [zero] add strict ddp mode for chunk init
* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
HELSON
2d1a7dfe5f
[zero] add strict ddp mode ( #2508 )
...
* [zero] add strict ddp mode
* [polish] add comments for strict ddp mode
* [zero] fix test error
2023-01-20 14:04:38 +08:00
Jiarui Fang
e327e95144
[hotfix] gpt example titans bug #2493 ( #2494 )
2023-01-18 12:04:18 +08:00
binmakeswell
fcc6d61d92
[example] fix requirements ( #2488 )
2023-01-17 13:07:25 +08:00
Jiarui Fang
3a21485ead
[example] titans for gpt ( #2484 )
2023-01-16 15:55:41 +08:00
Jiarui Fang
7c31706227
[CI] add test_ci.sh for palm, opt and gpt ( #2475 )
2023-01-16 14:44:29 +08:00
ver217
f525d1f528
[example] update gpt gemini example ci test ( #2477 )
2023-01-13 22:37:31 +08:00
Ziyue Jiang
fef5c949c3
polish pp middleware ( #2476 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-13 16:56:01 +08:00
Jiarui Fang
867c8c2d3a
[zero] low level optim supports ProcessGroup ( #2464 )
2023-01-13 10:05:58 +08:00
YuliangLiu0306
2731531bc2
[autoparallel] integrate device mesh initialization into autoparallelize ( #2393 )
...
* [autoparallel] integrate device mesh initialization into autoparallelize
* add megatron solution
* update gpt autoparallel examples with latest api
* adapt beta value to fit the current computation cost
2023-01-11 14:03:49 +08:00
HELSON
d84e747975
[hotfix] add DISTPAN argument for benchmark ( #2412 )
...
* change the benchmark config file
* change config
* revert config file
* rename distpan to distplan
2023-01-10 11:39:25 +08:00
HELSON
498b5ca993
[hotfix] fix gpt gemini example ( #2404 )
...
* [hotfix] fix gpt gemini example
* [example] add new assertions
2023-01-09 15:52:17 +08:00
Jiarui Fang
12c8bf38d7
[Pipeline] Refine GPT PP Example
2023-01-06 18:03:45 +08:00
Ziyue Jiang
ad00894f7f
polish
2023-01-06 16:03:16 +08:00
Jiarui Fang
1aaeb596c6
[example] gpt, shard init on all processes ( #2366 )
2023-01-06 15:44:50 +08:00
Ziyue Jiang
3a15b20421
Move GPT PP Example
2023-01-06 14:48:58 +08:00
YuliangLiu0306
8b1e0dfd80
[example] upload auto parallel gpt2 demo ( #2354 )
2023-01-06 11:38:38 +08:00
Jiarui Fang
00a9c781fd
[example] add google doc for benchmark results of GPT ( #2355 )
2023-01-06 11:38:15 +08:00
Jiarui Fang
509a87f3ff
[example] make gpt example directory more clear ( #2353 )
2023-01-06 11:11:26 +08:00
Jiarui Fang
35e22be2f6
[example] simplify opt example ( #2344 )
2023-01-06 10:08:41 +08:00
ziyuhuang123
7080a8edb0
[workflow]New version: Create workflow files for examples' auto check ( #2298 )
...
* [workflows]bug_repair
* [workflow]new_pr_fixing_bugs
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
HELSON
e00cedd181
[example] update gemini benchmark bash ( #2306 )
2023-01-04 11:59:26 +08:00
Ziyue Jiang
ac863a01d6
[example] add benchmark ( #2276 )
...
* add benchmark
* merge common func
* add total and avg tflops
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 17:20:59 +08:00