Commit Graph

2660 Commits (7a3dfd0c645fba51a02eb3c6ac88b4f09160ea7d)

Author SHA1 Message Date
Frank Lee a5883aa790
[test] fixed codefactor format report (#4026) 2023-06-16 18:23:02 +08:00
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) 2023-06-16 14:14:05 +08:00
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993)
* feat: make optimizer optional in Booster.boost

* test: skip unet test if diffusers version > 0.10.2
2023-06-15 17:38:42 +08:00
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) 2023-06-15 15:21:26 +08:00
digger yu d4fb7bfda7
fix typo applications/Chat/coati/ (#3947) 2023-06-15 10:43:11 +08:00
Baizhou Zhang e8ad3c88f5
[doc] add a note about unit-testing to CONTRIBUTING.md (#3970) 2023-06-14 16:32:39 +08:00
Yuanchen 2925f47399
[evaluate] support gpt evaluation with reference (#3972)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-06-13 15:12:29 +08:00
Frank Lee 8bcad73677
[workflow] fixed the directory check in build (#3980) 2023-06-13 14:42:35 +08:00
Wenhao Chen 9d02590c9a
[chat] refactor actor class (#3968)
* refactor: separate log_probs fn from Actor forward fn

* refactor: separate generate fn from Actor class

* feat: update unwrap_model and get_base_model
* unwrap_model returns model not wrapped by Strategy
* get_base_model returns HF model for Actor, Critic and RewardModel

* feat: simplify Strategy.prepare

* style: remove get_base_model method of Actor

* perf: tokenize text in batches

* refactor: move calc_action_log_probs to utils of model

* test: update test with new forward fn

* style: rename forward fn args

* fix: do not unwrap model in save_model fn of naive strategy

* test: add gemini test for train_prompts

* fix: fix _set_default_generate_kwargs
2023-06-13 13:31:56 +08:00
Frank Lee 2bf6547ad7
Merge pull request #3967 from ver217/update-develop
[sync] update develop branch with main
2023-06-12 16:39:43 +08:00
Frank Lee 6718a2f285 [workflow] cancel duplicated workflow jobs (#3960) 2023-06-12 15:11:27 +08:00
Frank Lee 71fe52769c [gemini] fixed the gemini checkpoint io (#3934) 2023-06-12 15:11:27 +08:00
Baizhou Zhang b3ab7fbabf
[example] update ViT example using booster api (#3940) 2023-06-12 15:02:27 +08:00
Frank Lee 4110d1f0d4
[workflow] cancel duplicated workflow jobs (#3960) 2023-06-12 09:50:57 +08:00
digger yu 1aadeedeea
fix typo .github/workflows/scripts/ (#3946) 2023-06-09 10:30:50 +08:00
digger yu e61ffc77c6
fix typo tests/ (#3936) 2023-06-09 09:49:41 +08:00
Frank Lee bd1ab98158
[gemini] fixed the gemini checkpoint io (#3934) 2023-06-09 09:48:49 +08:00
FoolPlayer bd2c7c3297
Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:42:28 +08:00
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop" 2023-06-09 09:41:27 +08:00
FoolPlayer 24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
[sync] sync feature/shardformer with develop
2023-06-09 09:34:00 +08:00
Liu Ziming e277534a18
Merge pull request #3905 from MaruyamaAya/dreambooth
[example] Adding an example of training dreambooth with the new booster API
2023-06-09 08:44:18 +08:00
Yuanchen 21c4c0b1a0
support UniEval and add CHRF metric (#3924)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-06-08 17:38:47 +08:00
digger yu 33eef714db
fix typo examples and docs (#3932) 2023-06-08 16:09:32 +08:00
FoolPlayer ef1537759c [shardformer] add gpt2 policy and modify shard and slicer to support (#3883)
* add gpt2 policy and modify shard and slicer to support

* remove unused code

* polish code
2023-06-08 15:01:34 +08:00
FoolPlayer 6370a935f6 update README (#3909) 2023-06-08 15:01:34 +08:00
FoolPlayer 21a3915c98 [shardformer] add Dropout layer support different dropout pattern (#3856)
* add dropout layer, add dropout test

* modify seed manager as context manager

* add a copy of col_nn.layer

* add dist_crossentropy loss; separate module test

* polish the code

* fix dist crossentropy loss
2023-06-08 15:01:34 +08:00
FoolPlayer 997544c1f9 [shardformer] update readme with modules implement doc (#3834)
* update readme with modules content

* remove img
2023-06-08 15:01:34 +08:00
Frank Lee 537a52b7a2 [shardformer] refactored the user api (#3828)
* [shardformer] refactored the user api

* polish code
2023-06-08 15:01:34 +08:00
Frank Lee bc19024bf9 [shardformer] updated readme (#3827) 2023-06-08 15:01:34 +08:00
FoolPlayer 58f6432416 [shardformer]: Feature/shardformer, add some docstring and readme (#3816)
* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example

* add share weight and train example

* add train

* add docstring and readme

* add docstring for other files

* pre-commit
2023-06-08 15:01:34 +08:00
FoolPlayer 6a69b44dfc [shardformer] init shardformer code structure (#3731)
* init shardformer code structure

* add implement of sharder (inject and replace)

* add implement of replace layer to colossal layer

* separate different layer policy, add some notion

* implement 1d and 2d slicer, can tell col or row

* fix bug when slicing and inject model

* fix some bug; add inference test example
2023-06-08 15:01:34 +08:00
Maruyama_Aya 9b5e7ce21f modify shell for check 2023-06-08 14:56:56 +08:00
Frank Lee a98e16ed07
Merge pull request #3926 from hpcaitech/feature/dtensor
[feature] updated device mesh and dtensor
2023-06-08 14:39:40 +08:00
digger yu 407aa48461
fix typo examples/community/roberta (#3925) 2023-06-08 14:28:34 +08:00
Maruyama_Aya 730a092ba2 modify shell for check 2023-06-08 13:38:18 +08:00
Maruyama_Aya 49567d56d1 modify shell for check 2023-06-08 13:36:05 +08:00
Maruyama_Aya 039854b391 modify shell for check 2023-06-08 13:17:58 +08:00
Baizhou Zhang e417dd004e
[example] update opt example using booster api (#3918) 2023-06-08 11:27:05 +08:00
Maruyama_Aya cf4792c975 modify shell for check 2023-06-08 11:15:10 +08:00
Frank Lee eb39154d40
[dtensor] updated api and doc (#3845) 2023-06-08 10:18:17 +08:00
Hongxin Liu 9166988d9b
[devops] update torch version in compability test (#3919) 2023-06-08 09:29:32 +08:00
digger yu de0d7df33f
[nfc] fix typo colossalai/zero (#3923) 2023-06-08 00:01:29 +08:00
Hongxin Liu 12c90db3f3
[doc] add lazy init tutorial (#3922)
* [doc] add lazy init en doc

* [doc] add lazy init zh doc

* [doc] add lazy init doc in sidebar

* [doc] add lazy init doc test

* [doc] fix lazy init doc link
2023-06-07 17:59:58 +08:00
Maruyama_Aya c94a33579b modify shell for check 2023-06-07 17:23:01 +08:00
digger yu a9d1cadc49
fix typo with colossalai/trainer utils zero (#3908) 2023-06-07 16:08:37 +08:00
Liu Ziming b306cecf28
[example] Modify palm example with the new booster API (#3913)
* Modify torch version requirement to adapt torch 2.0

* modify palm example using new booster API

* roll back

* fix port

* polish

* polish
2023-06-07 16:05:00 +08:00
wukong1992 a55fb00c18
[booster] update bert example, using booster api (#3885) 2023-06-07 15:51:00 +08:00
Frank Lee 5e2132dcff
[workflow] added docker latest tag for release (#3920) 2023-06-07 15:37:37 +08:00
Hongxin Liu c25d421f3e
[devops] hotfix testmon cache clean logic (#3917) 2023-06-07 12:39:12 +08:00
Frank Lee d51e83d642
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop
[sync] sync feature/dtensor with develop
2023-06-07 11:50:43 +08:00