Commit Graph

2455 Commits (822c3d4d66d2d74cb7c7080abed6a207602dddfd)
 

Author SHA1 Message Date
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002)
1 year ago
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993)
1 year ago
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984)
1 year ago
Frank Lee 8bcad73677
[workflow] fixed the directory check in build (#3980)
1 year ago
Frank Lee 2bf6547ad7
Merge pull request #3967 from ver217/update-develop
1 year ago
Frank Lee 6718a2f285 [workflow] cancel duplicated workflow jobs (#3960)
1 year ago
Frank Lee 71fe52769c [gemini] fixed the gemini checkpoint io (#3934)
1 year ago
Baizhou Zhang b3ab7fbabf
[example] update ViT example using booster api (#3940)
1 year ago
Frank Lee 4110d1f0d4
[workflow] cancel duplicated workflow jobs (#3960)
1 year ago
digger yu 1aadeedeea
fix typo .github/workflows/scripts/ (#3946)
1 year ago
digger yu e61ffc77c6
fix typo tests/ (#3936)
1 year ago
Frank Lee bd1ab98158
[gemini] fixed the gemini checkpoint io (#3934)
1 year ago
FoolPlayer bd2c7c3297
Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer
1 year ago
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
1 year ago
FoolPlayer 24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
1 year ago
Liu Ziming e277534a18
Merge pull request #3905 from MaruyamaAya/dreambooth
1 year ago
Yuanchen 21c4c0b1a0
support UniEval and add CHRF metric (#3924)
1 year ago
digger yu 33eef714db
fix typo examples and docs (#3932)
1 year ago
FoolPlayer ef1537759c [shardformer] add gpt2 policy and modify shard and slicer to support (#3883)
1 year ago
FoolPlayer 6370a935f6 update README (#3909)
1 year ago
FoolPlayer 21a3915c98 [shardformer] add Dropout layer support different dropout pattern (#3856)
1 year ago
FoolPlayer 997544c1f9 [shardformer] update readme with modules implement doc (#3834)
1 year ago
Frank Lee 537a52b7a2 [shardformer] refactored the user api (#3828)
1 year ago
Frank Lee bc19024bf9 [shardformer] updated readme (#3827)
1 year ago
FoolPlayer 58f6432416 [shardformer]: Feature/shardformer, add some docstring and readme (#3816)
1 year ago
FoolPlayer 6a69b44dfc [shardformer] init shardformer code structure (#3731)
1 year ago
Maruyama_Aya 9b5e7ce21f modify shell for check
1 year ago
Frank Lee a98e16ed07
Merge pull request #3926 from hpcaitech/feature/dtensor
1 year ago
digger yu 407aa48461
fix typo examples/community/roberta (#3925)
1 year ago
Maruyama_Aya 730a092ba2 modify shell for check
1 year ago
Maruyama_Aya 49567d56d1 modify shell for check
1 year ago
Maruyama_Aya 039854b391 modify shell for check
1 year ago
Baizhou Zhang e417dd004e
[example] update opt example using booster api (#3918)
1 year ago
Maruyama_Aya cf4792c975 modify shell for check
1 year ago
Frank Lee eb39154d40
[dtensor] updated api and doc (#3845)
1 year ago
Hongxin Liu 9166988d9b
[devops] update torch version in compability test (#3919)
1 year ago
digger yu de0d7df33f
[nfc] fix typo colossalai/zero (#3923)
1 year ago
Hongxin Liu 12c90db3f3
[doc] add lazy init tutorial (#3922)
1 year ago
Maruyama_Aya c94a33579b modify shell for check
1 year ago
digger yu a9d1cadc49
fix typo with colossalai/trainer utils zero (#3908)
1 year ago
Liu Ziming b306cecf28
[example] Modify palm example with the new booster API (#3913)
1 year ago
wukong1992 a55fb00c18
[booster] update bert example, using booster api (#3885)
1 year ago
Frank Lee 5e2132dcff
[workflow] added docker latest tag for release (#3920)
1 year ago
Hongxin Liu c25d421f3e
[devops] hotfix testmon cache clean logic (#3917)
1 year ago
Frank Lee d51e83d642
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop
1 year ago
Frank Lee c622bb3630
Merge pull request #3915 from FrankLeeeee/update/develop
1 year ago
Hongxin Liu 9c88b6cbd1
[lazy] fix compatibility problem on torch 1.13 (#3911)
1 year ago
Maruyama_Aya 4fc8bc68ac modify file path
1 year ago
Hongxin Liu b5f0566363
[chat] add distributed PPO trainer (#3740)
1 year ago
Hongxin Liu 41fb7236aa
[devops] hotfix CI about testmon cache (#3910)
1 year ago