Commit Graph

2469 Commits (2c8ae37f61f123a305f7fe66af29140fe0f68a34)

Author SHA1 Message Date
Baizhou Zhang 2c8ae37f61
Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching
[gemini] Rename arguments in chunk configuration searching
2023-06-25 17:37:37 +08:00
Wenhao Chen 153b957a1b
[chat] refactor strategy class with booster api (#3987)
* refactor: adapt boost API in base and naive strategies

* fix: initialize plugin after setup_distributed

* fix: fix save_pretrained fn

* refactor: adapt boost API in DDPStrategy

* to: add _post_init check

* to: fix ddp backward, modify ddp dataloader and unwrap

* feat: adapt boost API in ColossalAIStrategy

* fix: call setup_distributed before use get_current_device

* fix: fix save_model and save_optimizer

* test: remove save_sharded_optimizer test

* style: apply formatter

* fix: fix stage check and add comments

* feat: allow dict type arg in strategy.prepare

* to: temporarily remove lr_scheduler for testing

* style: simplify init of ColossalAIStrategy

* fix: fix lr_scheduler in sft and rm

* style: modify comments

* test: add train_prompts tests

* fix: fix inference only case and use in train_prompts

* test: skip failed tests in ci

* style: fix CodeFactor check

* fix: do not use model.to('cpu') with GeminiPlugin

* test: enable colossalai_gemini tests

* test: set CUDA_VISIBLE_DEVICES in ci

* docs: add note
2023-06-25 17:36:21 +08:00
Baizhou Zhang 0bb0b481b4 [gemini] fix argument naming during chunk configuration searching 2023-06-25 13:34:15 +08:00
Frank Lee b463651f3e
[workflow] cover all public repositories in weekly report (#4069) 2023-06-22 14:41:25 +08:00
Hongxin Liu 4a81faa5f3
[devops] fix build on pr ci (#4043)
* [devops] fix build on pr ci

* [devops] fix build on pr ci
2023-06-19 17:12:56 +08:00
github-actions[bot] a52f62082d
[format] applied code formatting on changed files in pull request 4021 (#4022)
Co-authored-by: github-actions <github-actions@github.com>
2023-06-19 11:23:24 +08:00
LuGY 160c64c645
[example] fix bucket size in example of gpt gemini (#4028) 2023-06-19 11:22:42 +08:00
digger yu 727c4598a9
[nfc] fix dim not defined and fix typo (#3991) 2023-06-19 11:21:55 +08:00
Frank Lee ca768eb62d
Merge pull request #4025 from hpcaitech/develop
[sync] sync develop to main
2023-06-19 10:31:34 +08:00
Frank Lee a5883aa790
[test] fixed codefactor format report (#4026) 2023-06-16 18:23:02 +08:00
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) 2023-06-16 14:14:05 +08:00
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993)
* feat: make optimizer optional in Booster.boost

* test: skip unet test if diffusers version > 0.10.2
2023-06-15 17:38:42 +08:00
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) 2023-06-15 15:21:26 +08:00
digger yu d4fb7bfda7
fix typo applications/Chat/coati/ (#3947) 2023-06-15 10:43:11 +08:00
Baizhou Zhang e8ad3c88f5
[doc] add a note about unit-testing to CONTRIBUTING.md (#3970) 2023-06-14 16:32:39 +08:00
Yuanchen 2925f47399
[evaluate] support gpt evaluation with reference (#3972)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-06-13 15:12:29 +08:00
Frank Lee 8bcad73677
[workflow] fixed the directory check in build (#3980) 2023-06-13 14:42:35 +08:00
Wenhao Chen 9d02590c9a
[chat] refactor actor class (#3968)
* refactor: separate log_probs fn from Actor forward fn

* refactor: separate generate fn from Actor class

* feat: update unwrap_model and get_base_model
* unwrap_model returns model not wrapped by Strategy
* get_base_model returns HF model for Actor, Critic and RewardModel

* feat: simplify Strategy.prepare

* style: remove get_base_model method of Actor

* perf: tokenize text in batches

* refactor: move calc_action_log_probs to utils of model

* test: update test with new forward fn

* style: rename forward fn args

* fix: do not unwrap model in save_model fn of naive strategy

* test: add gemini test for train_prompts

* fix: fix _set_default_generate_kwargs
2023-06-13 13:31:56 +08:00
Frank Lee 2bf6547ad7
Merge pull request #3967 from ver217/update-develop
[sync] update develop branch with main
2023-06-12 16:39:43 +08:00
Frank Lee 6718a2f285 [workflow] cancel duplicated workflow jobs (#3960) 2023-06-12 15:11:27 +08:00
Frank Lee 71fe52769c [gemini] fixed the gemini checkpoint io (#3934) 2023-06-12 15:11:27 +08:00
Baizhou Zhang b3ab7fbabf
[example] update ViT example using booster api (#3940) 2023-06-12 15:02:27 +08:00
Frank Lee 4110d1f0d4
[workflow] cancel duplicated workflow jobs (#3960) 2023-06-12 09:50:57 +08:00
digger yu 1aadeedeea
fix typo .github/workflows/scripts/ (#3946) 2023-06-09 10:30:50 +08:00
digger yu e61ffc77c6
fix typo tests/ (#3936) 2023-06-09 09:49:41 +08:00
Frank Lee bd1ab98158
[gemini] fixed the gemini checkpoint io (#3934) 2023-06-09 09:48:49 +08:00
FoolPlayer bd2c7c3297
Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:42:28 +08:00
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop" 2023-06-09 09:41:27 +08:00
FoolPlayer 24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
[sync] sync feature/shardformer with develop
2023-06-09 09:34:00 +08:00
Liu Ziming e277534a18
Merge pull request #3905 from MaruyamaAya/dreambooth
[example] Adding an example of training dreambooth with the new booster API
2023-06-09 08:44:18 +08:00
Yuanchen 21c4c0b1a0
support UniEval and add CHRF metric (#3924)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-06-08 17:38:47 +08:00
digger yu 33eef714db
fix typo examples and docs (#3932) 2023-06-08 16:09:32 +08:00
FoolPlayer ef1537759c [shardformer] add gpt2 policy and modify shard and slicer to support (#3883)
* add gpt2 policy and modify shard and slicer to support

* remove unused code

* polish code
2023-06-08 15:01:34 +08:00
FoolPlayer 6370a935f6 update README (#3909) 2023-06-08 15:01:34 +08:00
FoolPlayer 21a3915c98 [shardformer] add Dropout layer support different dropout pattern (#3856)
* add dropout layer, add dropout test

* modify seed manager as context manager

* add a copy of col_nn.layer

* add dist_crossentropy loss; separate module test

* polish the code

* fix dist crossentropy loss
2023-06-08 15:01:34 +08:00
FoolPlayer 997544c1f9 [shardformer] update readme with modules implement doc (#3834)
* update readme with modules content

* remove img
2023-06-08 15:01:34 +08:00
Frank Lee 537a52b7a2 [shardformer] refactored the user api (#3828)
* [shardformer] refactored the user api

* polish code
2023-06-08 15:01:34 +08:00
Frank Lee bc19024bf9 [shardformer] updated readme (#3827) 2023-06-08 15:01:34 +08:00
FoolPlayer 58f6432416 [shardformer]: Feature/shardformer, add some docstring and readme (#3816)
* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

* add share weight and train example

* add train

* add docstring and readme

* add docstring for other files

* pre-commit
2023-06-08 15:01:34 +08:00
FoolPlayer 6a69b44dfc [shardformer] init shardformer code structure (#3731)
* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example
2023-06-08 15:01:34 +08:00
Maruyama_Aya 9b5e7ce21f modify shell for check 2023-06-08 14:56:56 +08:00
Frank Lee a98e16ed07
Merge pull request #3926 from hpcaitech/feature/dtensor
[feature] updated device mesh and dtensor
2023-06-08 14:39:40 +08:00
digger yu 407aa48461
fix typo examples/community/roberta (#3925) 2023-06-08 14:28:34 +08:00
Maruyama_Aya 730a092ba2 modify shell for check 2023-06-08 13:38:18 +08:00
Maruyama_Aya 49567d56d1 modify shell for check 2023-06-08 13:36:05 +08:00
Maruyama_Aya 039854b391 modify shell for check 2023-06-08 13:17:58 +08:00
Baizhou Zhang e417dd004e
[example] update opt example using booster api (#3918) 2023-06-08 11:27:05 +08:00
Maruyama_Aya cf4792c975 modify shell for check 2023-06-08 11:15:10 +08:00
Frank Lee eb39154d40
[dtensor] updated api and doc (#3845) 2023-06-08 10:18:17 +08:00
Hongxin Liu 9166988d9b
[devops] update torch version in compability test (#3919) 2023-06-08 09:29:32 +08:00