Commit Graph

35 Commits (ea088b5f75e9c9a79d67b370286da2a1508688c8)

Author SHA1 Message Date
Xuanlei Zhao 8ca8cf8ec3 update optim
11 months ago
Xuanlei Zhao c1c6af6368 update
11 months ago
Xuanlei Zhao ccad7014c6 update optim
11 months ago
Xuanlei Zhao 44014faa67 fix optim
11 months ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
1 year ago
Zhongkai Zhao a0684e7bd6
[feature] support no master weights option for low level zero plugin (#4816)
1 year ago
littsk 83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837)
1 year ago
littsk 11f1e426fe
[hotfix] Correct several erroneous code comments (#4794)
1 year ago
littsk 54b3ad8924
[hotfix] fix norm type error in zero optimizer (#4795)
1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
1 year ago
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer
1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618)
1 year ago
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer
1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591)
1 year ago
LuGY cbac782254
[zero]fix zero ckptIO with offload (#4529)
1 year ago
flybird11111 ec18fc7340
[shardformer] support pp+tp+zero1 tests (#4531)
1 year ago
Jianghai 376533a564
[shardformer] zero1+pp and the corresponding tests (#4517)
1 year ago
LuGY 839847b7d7
[zero]support zero2 with gradient accumulation (#4511)
1 year ago
LuGY d86ddd9b29
[hotfix] fix unsafe async comm in zero (#4404)
1 year ago
LuGY 45b08f08cb [zero] optimize the optimizer step time (#4221)
1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194)
1 year ago
LuGY dd7cc58299 [zero] add state dict for low level zero (#4179)
1 year ago
LuGY c668801d36 [zero] allow passing process group to zero12 (#4153)
1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138)
1 year ago
LuGY c6ab96983a [zero] refactor low level zero for shard evenly (#4030)
1 year ago
digger yu de0d7df33f
[nfc] fix typo colossalai/zero (#3923)
1 year ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882)
1 year ago
YH a22407cc02
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173)
2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594)
2 years ago
Hongxin Liu 173dad0562
[misc] add verbose arg for zero and op builder (#3552)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago