Commit Graph

43 Commits (dabc2e7430bb5d6bad1fed7629b534c10ae8c609)

Author SHA1 Message Date
Guangyao Zhang bdb125f83f
[doc] FP8 training and communication document (#6050)
3 months ago
Hongxin Liu 13946c4448
[fp8] hotfix backward hook (#6053)
3 months ago
Hongxin Liu cc1b0efc17
[plugin] hotfix zero plugin (#6036)
3 months ago
pre-commit-ci[bot] 80d24ae519 [pre-commit.ci] auto fixes from pre-commit.com hooks
3 months ago
Hongxin Liu caab4a307f
Merge branch 'main' into feature/fp8_comm
3 months ago
pre-commit-ci[bot] a292554179 [pre-commit.ci] auto fixes from pre-commit.com hooks
3 months ago
wangbluo 971b16a74f fix
3 months ago
Wang Binluo eea37da6fa
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016)
3 months ago
Hongxin Liu 0d3b0bd864
[plugin] add cast inputs option for zero (#6003) (#6022)
3 months ago
Edenzzzz dcc44aab8d
[misc] Use dist logger in plugins (#6011)
3 months ago
flybird11111 0a51319113
[fp8] zero support fp8 linear. (#6006)
3 months ago
Hongxin Liu 406f984063
[plugin] add cast inputs option for zero (#6003)
4 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963)
4 months ago
ver217 ae486ce005 [fp8] add fp8 comm for low level zero
4 months ago
hxwang 70c9924d0d [chore] solve moe ckpt test failure and some other arg pass failure
4 months ago
Hongxin Liu e86127925a
[plugin] support all-gather overlap for hybrid parallel (#5919)
4 months ago
Hongxin Liu c068ef0fa0
[zero] support all-gather overlap (#5898)
5 months ago
Edenzzzz 5f8c0a0ac3
[Feature] auto-cast optimizers to distributed version (#5746)
6 months ago
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
7 months ago
linsj20 91fa553775 [Feature] qlora support (#5586)
7 months ago
flybird11111 8954a0c2e2 [LowLevelZero] low level zero support lora (#5153)
7 months ago
Baizhou Zhang 14b0d4c7e5 [lora] add lora APIs for booster, support lora for TorchDDP (#4981)
7 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
11 months ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
Baizhou Zhang c040d70aa0
[hotfix] fix the bug of repeatedly storing param group (#4951)
1 year ago
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869)
1 year ago
Zhongkai Zhao a0684e7bd6
[feature] support no master weights option for low level zero plugin (#4816)
1 year ago
shaoyuw c97a3523db fix: typo in comment of low_level_zero plugin
1 year ago
Baizhou Zhang a2db75546d
[doc] polish shardformer doc (#4779)
1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618)
1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591)
1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194)
1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138)
1 year ago
梁爽 abe4f971e0 [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256)
1 year ago
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993)
1 year ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882)
2 years ago
Hongxin Liu 3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint (#3780)
2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709)
2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706)
2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684)
2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594)
2 years ago