81 Commits (feature/inference-refactor)

Author SHA1 Message Date
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015) 1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago
Baizhou Zhang c040d70aa0
[hotfix] fix the bug of repeatedly storing param group (#4951) 1 year ago
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869) 1 year ago
Zhongkai Zhao a0684e7bd6
[feature] support no master weights option for low level zero plugin (#4816) 1 year ago
littsk 83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) 1 year ago
Hongxin Liu df63564184
[gemini] support amp o3 for gemini (#4872) 1 year ago
shaoyuw c97a3523db fix: typo in comment of low_level_zero plugin 1 year ago
Hongxin Liu 4965c0dabd
[lazy] support from_pretrained (#4801) 1 year ago
Baizhou Zhang a2db75546d
[doc] polish shardformer doc (#4779) 1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Xuanlei Zhao ac2797996b
[shardformer] add custom policy in hybrid parallel plugin (#4718) 1 year ago
Baizhou Zhang f911d5b09d
[doc] Add user document for Shardformer (#4702) 1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697) 1 year ago
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669) 1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630) 1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618) 1 year ago
Bin Jia 86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) 1 year ago
Baizhou Zhang e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins (#4606) 1 year ago
flybird11111 0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) 1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591) 1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589) 1 year ago
Baizhou Zhang 38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) 1 year ago
Baizhou Zhang c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) 1 year ago
Baizhou Zhang 44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) 1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479) 1 year ago
Baizhou Zhang 1c7df566e2
[shardformer] support tp+zero for shardformer (#4472) 1 year ago
Bin Jia 7c8be77081
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) 1 year ago
Baizhou Zhang 6ef33f75aa
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) 1 year ago
Bin Jia 424629fea0
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450) 1 year ago
Hongxin Liu 172f7fa3cf [misc] resolve code factor issues (#4433) 1 year ago
flybird1111 d2cd48e0be [shardformer] test all optimizations (#4399) 1 year ago
Baizhou Zhang ed4c448488 [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) 1 year ago
Baizhou Zhang b1feeced8e [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) 1 year ago
Baizhou Zhang 0ceec8f9a9 [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) 1 year ago
Hongxin Liu 261eab02fb [plugin] add 3d parallel plugin (#4295) 1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194) 1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138) 1 year ago
梁爽 abe4f971e0 [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) 1 year ago
Jianghai b366f1d99f [NFC] Fix format for mixed precision (#4253) 1 year ago
Baizhou Zhang c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) 1 year ago
Baizhou Zhang 58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) 1 year ago
Baizhou Zhang 0bb0b481b4 [gemini] fix argument naming during chunk configuration searching 1 year ago
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) 1 year ago
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993) 1 year ago
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) 1 year ago
Frank Lee 71fe52769c [gemini] fixed the gemini checkpoint io (#3934) 1 year ago
Frank Lee bd1ab98158
[gemini] fixed the gemini checkpoint io (#3934) 1 year ago