Commit Graph

68 Commits (3c6b831c264d0657a97034b5cf036c913a762083)

Author SHA1 Message Date
Xuanlei Zhao ac2797996b
[shardformer] add custom policy in hybrid parallel plugin (#4718)
1 year ago
Baizhou Zhang f911d5b09d
[doc] Add user document for Shardformer (#4702)
1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697)
1 year ago
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669)
1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630)
1 year ago
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer
1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618)
1 year ago
Bin Jia 86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin (#4615)
1 year ago
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer
1 year ago
Baizhou Zhang e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins (#4606)
1 year ago
flybird11111 0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin (#4584)
1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591)
1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589)
1 year ago
Baizhou Zhang 38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575)
1 year ago
Baizhou Zhang c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
1 year ago
Baizhou Zhang 44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506)
1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
1 year ago
Baizhou Zhang 1c7df566e2
[shardformer] support tp+zero for shardformer (#4472)
1 year ago
Bin Jia 7c8be77081
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460)
1 year ago
Baizhou Zhang 6ef33f75aa
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446)
1 year ago
Bin Jia 424629fea0
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450)
1 year ago
Hongxin Liu 172f7fa3cf [misc] resolve code factor issues (#4433)
1 year ago
flybird1111 d2cd48e0be [shardformer] test all optimizations (#4399)
1 year ago
Baizhou Zhang ed4c448488 [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388)
1 year ago
Baizhou Zhang b1feeced8e [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366)
1 year ago
Baizhou Zhang 0ceec8f9a9 [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)
1 year ago
Hongxin Liu 261eab02fb [plugin] add 3d parallel plugin (#4295)
1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194)
1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138)
1 year ago
梁爽 abe4f971e0 [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256)
1 year ago
Jianghai b366f1d99f [NFC] Fix format for mixed precision (#4253)
1 year ago
Baizhou Zhang c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302)
1 year ago
Baizhou Zhang 58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141)
1 year ago
Baizhou Zhang 0bb0b481b4 [gemini] fix argument naming during chunk configuration searching
1 year ago
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002)
1 year ago
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993)
1 year ago
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984)
1 year ago
Frank Lee bd1ab98158
[gemini] fixed the gemini checkpoint io (#3934)
1 year ago
Baizhou Zhang c1535ccbba
[doc] fix docs about booster api usage (#3898)
2 years ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882)
2 years ago
wukong1992 3229f93e30
[booster] add warning for torch fsdp plugin doc (#3833)
2 years ago
digger yu 7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808)
2 years ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788)
2 years ago
jiangmingyan e871e342b3
[API] add docstrings and initialization to apex amp, naive amp (#3783)
2 years ago
Frank Lee f5c425c898
fixed the example docstring for booster (#3795)
2 years ago
Hongxin Liu 72688adb2f
[doc] add booster docstring and fix autodoc (#3789)
2 years ago
Hongxin Liu 3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint (#3780)
2 years ago
Hongxin Liu 60e6a154bc
[doc] add tutorial for booster checkpoint (#3785)
2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775)
2 years ago
jiangmingyan 2703a37ac9
[amp] Add naive amp demo (#3774)
2 years ago