digger yu
8abc87798f
fix Tensor is not defined ( #4129 )
2023-07-03 17:10:18 +08:00
digger yu
7e46bc87b6
fix CheckpointIndexFile is not defined ( #4109 )
2023-07-03 17:09:06 +08:00
digger yu
09fe9dc704
[nfc]fix ColossalaiOptimizer is not defined ( #4122 )
2023-06-30 17:23:22 +08:00
Wenhao Chen
edd75a59ea
[chat] remove naive strategy and split colossalai strategy ( #4094 )
...
* feat: remove on_learn_epoch fn as not used
* revert: add _on_learn_epoch fn
* to: remove the use of NaiveStrategy
* test: remove NaiveStrategy tests
* feat: remove NaiveStrategy
* style: modify comments and params
* feat: split ColossalAIStrategy into LowLevelZeroStrategy and GeminiStrategy
* fix: remove naive
* fix: align with modified colossal strategy
* fix: fix ddp _try_init_dist arg
2023-06-29 18:11:00 +08:00
Wenhao Chen
b03d64d010
[chat] refactor trainer class ( #4080 )
...
* to: add SLTrainer
* refactor: refactor RMTrainer and SFTTrainer
* fix: fix init file
* feat: remove on_learn_epoch fn as not used
* fix: align with modified gemini arguments
* to: add OnPolicyTrainer
* revert: add _on_learn_epoch fn
* refactor: refactor PPOTrainer
* style: rename PPOTrainer argument
* fix: align with modified PPO arguments
* test: align with modified train_prompts arguments
* chore: modify train_prompts
* docs: align with modified arguments
* fix: remove unnecessary output
* fix: move dataloader to fit fn of SLTrainer
* fix: move dataloader to fit fn of OnPolicyTrainer
* fix: modify usage of prompt and pretrain dataloader
2023-06-29 10:48:09 +08:00
Jianghai
711e2b4c00
[doc] update and revise some typos and errs in docs ( #4107 )
...
* fix some typos and problems in doc
* fix some typos and problems in doc
* add doc test
2023-06-28 19:30:37 +08:00
digger yu
769cddcb2c
fix typo docs/ ( #4033 )
2023-06-28 15:30:30 +08:00
digger yu
2d40759a53
fix #3852 path error ( #4058 )
2023-06-28 15:29:44 +08:00
Frank Lee
1ee947f617
[workflow] added status check for test coverage workflow ( #4106 )
2023-06-28 14:33:43 +08:00
Jianghai
31dc302017
[examples] copy resnet example to image ( #4090 )
...
* copy resnet example
* add pytest package
* skip test_ci
* skip test_ci
* skip test_ci
2023-06-27 16:40:46 +08:00
Frank Lee
95e95b6d58
[testing] move pytest to be inside the function ( #4087 )
2023-06-27 11:02:25 +08:00
Baizhou Zhang
4da324cd60
[hotfix]fix argument naming in docs and examples ( #4083 )
2023-06-26 23:50:04 +08:00
Michelle
e89b127d8e
[chat]: fix chat evaluation possible bug ( #4064 )
...
* fix chat eval
* fix utils
* fix utils
* add comment
---------
Co-authored-by: Qianran Ma <qianranm@luchentech.com>
2023-06-26 15:26:07 +08:00
Baizhou Zhang
2c8ae37f61
Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching
...
[gemini] Rename arguments in chunk configuration searching
2023-06-25 17:37:37 +08:00
Wenhao Chen
153b957a1b
[chat] refactor strategy class with booster api ( #3987 )
...
* refactor: adapt boost API in base and naive strategies
* fix: initialize plugin after setup_distributed
* fix: fix save_pretrained fn
* refactor: adapt boost API in DDPStrategy
* to: add _post_init check
* to: fix ddp backward, modify ddp dataloader and unwrap
* feat: adapt boost API in ColossalAIStrategy
* fix: call setup_distributed before use get_current_device
* fix: fix save_model and save_optimizer
* test: remove save_sharded_optimizer test
* style: apply formatter
* fix: fix stage check and add comments
* feat: allow dict type arg in strategy.prepare
* to: temporarily remove lr_scheduler for testing
* style: simplify init of ColossalAIStrategy
* fix: fix lr_scheduler in sft and rm
* style: modify comments
* test: add train_prompts tests
* fix: fix inference only case and use in train_prompts
* test: skip failed tests in ci
* style: fix CodeFactor check
* fix: do not use model.to('cpu') with GeminiPlugin
* test: enable colossalai_gemini tests
* test: set CUDA_VISIBLE_DEVICES in ci
* docs: add note
2023-06-25 17:36:21 +08:00
Baizhou Zhang
0bb0b481b4
[gemini] fix argument naming during chunk configuration searching
2023-06-25 13:34:15 +08:00
Frank Lee
b463651f3e
[workflow] cover all public repositories in weekly report ( #4069 )
2023-06-22 14:41:25 +08:00
Hongxin Liu
4a81faa5f3
[devops] fix build on pr ci ( #4043 )
...
* [devops] fix build on pr ci
* [devops] fix build on pr ci
2023-06-19 17:12:56 +08:00
github-actions[bot]
a52f62082d
[format] applied code formatting on changed files in pull request 4021 ( #4022 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-06-19 11:23:24 +08:00
LuGY
160c64c645
[example] fix bucket size in example of gpt gemini ( #4028 )
2023-06-19 11:22:42 +08:00
digger yu
727c4598a9
[nfc] fix dim not defined and fix typo ( #3991 )
2023-06-19 11:21:55 +08:00
Frank Lee
ca768eb62d
Merge pull request #4025 from hpcaitech/develop
...
[sync] sync develop to main
2023-06-19 10:31:34 +08:00
Frank Lee
a5883aa790
[test] fixed codefactor format report ( #4026 )
2023-06-16 18:23:02 +08:00
Baizhou Zhang
822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin ( #4002 )
2023-06-16 14:14:05 +08:00
Wenhao Chen
725af3eeeb
[booster] make optimizer argument optional for boost ( #3993 )
...
* feat: make optimizer optional in Booster.boost
* test: skip unet test if diffusers version > 0.10.2
2023-06-15 17:38:42 +08:00
Baizhou Zhang
c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers ( #3984 )
2023-06-15 15:21:26 +08:00
digger yu
d4fb7bfda7
fix typo applications/Chat/coati/ ( #3947 )
2023-06-15 10:43:11 +08:00
Baizhou Zhang
e8ad3c88f5
[doc] add a note about unit-testing to CONTRIBUTING.md ( #3970 )
2023-06-14 16:32:39 +08:00
Yuanchen
2925f47399
[evaluate] support gpt evaluation with reference ( #3972 )
...
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-06-13 15:12:29 +08:00
Frank Lee
8bcad73677
[workflow] fixed the directory check in build ( #3980 )
2023-06-13 14:42:35 +08:00
Wenhao Chen
9d02590c9a
[chat] refactor actor class ( #3968 )
...
* refactor: separate log_probs fn from Actor forward fn
* refactor: separate generate fn from Actor class
* feat: update unwrap_model and get_base_model
* unwrap_model returns model not wrapped by Strategy
* get_base_model returns HF model for Actor, Critic and RewardModel
* feat: simplify Strategy.prepare
* style: remove get_base_model method of Actor
* perf: tokenize text in batches
* refactor: move calc_action_log_probs to utils of model
* test: update test with new forward fn
* style: rename forward fn args
* fix: do not unwrap model in save_model fn of naive strategy
* test: add gemini test for train_prompts
* fix: fix _set_default_generate_kwargs
2023-06-13 13:31:56 +08:00
Frank Lee
2bf6547ad7
Merge pull request #3967 from ver217/update-develop
...
[sync] update develop branch with main
2023-06-12 16:39:43 +08:00
Frank Lee
6718a2f285
[workflow] cancel duplicated workflow jobs ( #3960 )
2023-06-12 15:11:27 +08:00
Frank Lee
71fe52769c
[gemini] fixed the gemini checkpoint io ( #3934 )
2023-06-12 15:11:27 +08:00
Baizhou Zhang
b3ab7fbabf
[example] update ViT example using booster api ( #3940 )
2023-06-12 15:02:27 +08:00
Frank Lee
4110d1f0d4
[workflow] cancel duplicated workflow jobs ( #3960 )
2023-06-12 09:50:57 +08:00
digger yu
1aadeedeea
fix typo .github/workflows/scripts/ ( #3946 )
2023-06-09 10:30:50 +08:00
digger yu
e61ffc77c6
fix typo tests/ ( #3936 )
2023-06-09 09:49:41 +08:00
Frank Lee
bd1ab98158
[gemini] fixed the gemini checkpoint io ( #3934 )
2023-06-09 09:48:49 +08:00
FoolPlayer
bd2c7c3297
Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer
...
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:42:28 +08:00
Frank Lee
ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:41:27 +08:00
FoolPlayer
24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
...
[sync] sync feature/shardformer with develop
2023-06-09 09:34:00 +08:00
Liu Ziming
e277534a18
Merge pull request #3905 from MaruyamaAya/dreambooth
...
[example] Adding an example of training dreambooth with the new booster API
2023-06-09 08:44:18 +08:00
Yuanchen
21c4c0b1a0
support UniEval and add CHRF metric ( #3924 )
...
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-06-08 17:38:47 +08:00
digger yu
33eef714db
fix typo examples and docs ( #3932 )
2023-06-08 16:09:32 +08:00
FoolPlayer
ef1537759c
[shardformer] add gpt2 policy and modify shard and slicer to support ( #3883 )
...
* add gpt2 policy and modify shard and slicer to support
* remove unused code
* polish code
2023-06-08 15:01:34 +08:00
FoolPlayer
6370a935f6
update README ( #3909 )
2023-06-08 15:01:34 +08:00
FoolPlayer
21a3915c98
[shardformer] add Dropout layer support different dropout pattern ( #3856 )
...
* add dropout layer, add dropout test
* modify seed manager as context manager
* add a copy of col_nn.layer
* add dist_crossentropy loss; separate module test
* polish the code
* fix dist crossentropy loss
2023-06-08 15:01:34 +08:00
FoolPlayer
997544c1f9
[shardformer] update readme with modules implement doc ( #3834 )
...
* update readme with modules content
* remove img
2023-06-08 15:01:34 +08:00
Frank Lee
537a52b7a2
[shardformer] refactored the user api ( #3828 )
...
* [shardformer] refactored the user api
* polish code
2023-06-08 15:01:34 +08:00