121 Commits (bd38fe6b912379080673a43d77fd3bdf0e5c852e)

Author SHA1 Message Date
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Xuanlei Zhao ac2797996b
[shardformer] add custom policy in hybrid parallel plugin (#4718) 1 year ago
Baizhou Zhang f911d5b09d
[doc] Add user document for Shardformer (#4702) 1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697) 1 year ago
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669) 1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630) 1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618) 1 year ago
Bin Jia 86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) 1 year ago
Baizhou Zhang e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins (#4606) 1 year ago
flybird11111 0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) 1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591) 1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589) 1 year ago
Baizhou Zhang 38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) 1 year ago
Baizhou Zhang c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) 1 year ago
Baizhou Zhang 44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) 1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479) 1 year ago
Baizhou Zhang 1c7df566e2
[shardformer] support tp+zero for shardformer (#4472) 1 year ago
Bin Jia 7c8be77081
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) 1 year ago
Baizhou Zhang 6ef33f75aa
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) 1 year ago
Bin Jia 424629fea0
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450) 1 year ago
Hongxin Liu 172f7fa3cf [misc] resolve code factor issues (#4433) 1 year ago
flybird1111 d2cd48e0be [shardformer] test all optimizations (#4399) 1 year ago
Baizhou Zhang ed4c448488 [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) 1 year ago
Baizhou Zhang b1feeced8e [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) 1 year ago
Baizhou Zhang 0ceec8f9a9 [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) 1 year ago
Hongxin Liu 261eab02fb [plugin] add 3d parallel plugin (#4295) 1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194) 1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138) 1 year ago
梁爽 abe4f971e0 [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) 1 year ago
Jianghai b366f1d99f [NFC] Fix format for mixed precision (#4253) 1 year ago
Baizhou Zhang c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) 1 year ago
Baizhou Zhang 58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) 1 year ago
Baizhou Zhang 0bb0b481b4 [gemini] fix argument naming during chunk configuration searching 1 year ago
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) 1 year ago
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993) 1 year ago
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) 1 year ago
Frank Lee 71fe52769c [gemini] fixed the gemini checkpoint io (#3934) 1 year ago
Frank Lee bd1ab98158
[gemini] fixed the gemini checkpoint io (#3934) 1 year ago
Baizhou Zhang c1535ccbba
[doc] fix docs about booster api usage (#3898) 1 year ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882) 1 year ago
wukong1992 3229f93e30
[booster] add warning for torch fsdp plugin doc (#3833) 2 years ago
digger yu 7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808) 2 years ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788) 2 years ago
jiangmingyan e871e342b3
[API] add docstrings and initialization to apex amp, naive amp (#3783) 2 years ago
Frank Lee f5c425c898
fixed the example docstring for booster (#3795) 2 years ago
Hongxin Liu 72688adb2f
[doc] add booster docstring and fix autodoc (#3789) 2 years ago
Hongxin Liu 3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint (#3780) 2 years ago
Hongxin Liu 60e6a154bc
[doc] add tutorial for booster checkpoint (#3785) 2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775) 2 years ago
jiangmingyan 2703a37ac9
[amp] Add naive amp demo (#3774) 2 years ago