Commit Graph

1417 Commits (725af3eeeb16f7a348578e19105bed4f4096e0ca)

Author SHA1 Message Date
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993)
1 year ago
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984)
1 year ago
Frank Lee 71fe52769c [gemini] fixed the gemini checkpoint io (#3934)
1 year ago
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
1 year ago
FoolPlayer 24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
1 year ago
FoolPlayer ef1537759c [shardformer] add gpt2 policy and modify shard and slicer to support (#3883)
1 year ago
FoolPlayer 6370a935f6 update README (#3909)
1 year ago
FoolPlayer 21a3915c98 [shardformer] add Dropout layer support different dropout pattern (#3856)
1 year ago
FoolPlayer 997544c1f9 [shardformer] update readme with modules implement doc (#3834)
1 year ago
Frank Lee 537a52b7a2 [shardformer] refactored the user api (#3828)
1 year ago
Frank Lee bc19024bf9 [shardformer] updated readme (#3827)
1 year ago
FoolPlayer 58f6432416 [shardformer]: Feature/shardformer, add some docstring and readme (#3816)
1 year ago
FoolPlayer 6a69b44dfc [shardformer] init shardformer code structure (#3731)
1 year ago
Frank Lee eb39154d40
[dtensor] updated api and doc (#3845)
1 year ago
digger yu de0d7df33f
[nfc] fix typo colossalai/zero (#3923)
1 year ago
digger yu a9d1cadc49
fix typo with colossalai/trainer utils zero (#3908)
2 years ago
Frank Lee d51e83d642
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop
2 years ago
Hongxin Liu 9c88b6cbd1
[lazy] fix compatibility problem on torch 1.13 (#3911)
2 years ago
digger yu 0e484e6201
[nfc]fix typo colossalai/pipeline tensor nn (#3899)
2 years ago
Baizhou Zhang c1535ccbba
[doc] fix docs about booster api usage (#3898)
2 years ago
digger yu 1878749753
[nfc] fix typo colossalai/nn (#3887)
2 years ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882)
2 years ago
Liu Ziming 8065cc5fba
Modify torch version requirement to adapt torch 2.0 (#3896)
2 years ago
Hongxin Liu dbb32692d2
[lazy] refactor lazy init (#3891)
2 years ago
digger yu 70c8cdecf4
[nfc] fix typo colossalai/cli fx kernel (#3847)
2 years ago
digger yu e2d81eba0d
[nfc] fix typo colossalai/ applications/ (#3831)
2 years ago
wukong1992 3229f93e30
[booster] add warning for torch fsdp plugin doc (#3833)
2 years ago
Hongxin Liu 7c9f2ed6dd
[dtensor] polish sharding spec docstring (#3838)
2 years ago
digger yu 7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808)
2 years ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788)
2 years ago
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779)
2 years ago
jiangmingyan e871e342b3
[API] add docstrings and initialization to apex amp, naive amp (#3783)
2 years ago
Frank Lee f5c425c898
fixed the example docstring for booster (#3795)
2 years ago
Hongxin Liu 72688adb2f
[doc] add booster docstring and fix autodoc (#3789)
2 years ago
Hongxin Liu 3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint (#3780)
2 years ago
Hongxin Liu 60e6a154bc
[doc] add tutorial for booster checkpoint (#3785)
2 years ago
digger yu 32f81f14d4
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756)
2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775)
2 years ago
jiangmingyan 2703a37ac9
[amp] Add naive amp demo (#3774)
2 years ago
digger yu 1baeb39c72
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742)
2 years ago
wukong1992 b37797ed3d
[booster] support torch fsdp plugin in booster (#3697)
2 years ago
digger-yu ad6460cf2c
[NFC] fix typo applications/ and colossalai/ (#3735)
2 years ago
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707)
2 years ago
jiangmingyan 20068ba188
[booster] add tests for ddp and low level zero's checkpointio (#3715)
2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709)
2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706)
2 years ago
Hongxin Liu f83ea813f5
[example] add train resnet/vit with booster example (#3694)
2 years ago
YH 2629f9717d
[tensor] Refactor handle_trans_spec in DistSpecManager
2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684)
2 years ago
jiangmingyan 307894f74d
[booster] gemini plugin support shard checkpoint (#3610)
2 years ago