Commit Graph

317 Commits (ckpt)

Author SHA1 Message Date
Hongxin Liu b9d646fe9e
[misc] fix dist logger (#5782)
6 months ago
Edenzzzz 5f8c0a0ac3
[Feature] auto-cast optimizers to distributed version (#5746)
6 months ago
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
7 months ago
Edenzzzz 15055f9a36
[hotfix] quick fixes to make legacy tutorials runnable (#5559)
8 months ago
Hongxin Liu 19e1a5cf16
[shardformer] update colo attention to support custom mask (#5510)
8 months ago
digger yu 385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. (#5429)
9 months ago
Hongxin Liu 070df689e6
[devops] fix extention building (#5427)
9 months ago
Hongxin Liu c53ddda88f
[lr-scheduler] fix load state dict and add test (#5369)
10 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
10 months ago
Xuanlei Zhao dd2c28a323
[npu] use extension for op builder (#5172)
11 months ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
1 year ago
Zhongkai Zhao c7aa319ba0
[test] add no master test for low level zero plugin (#4934)
1 year ago
Hongxin Liu 4f68b3f10c
[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921)
1 year ago
Baizhou Zhang 39f2582e98
[hotfix] fix lr scheduler bug in torch 2.0 (#4864)
1 year ago
Hongxin Liu df63564184
[gemini] support amp o3 for gemini (#4872)
1 year ago
ppt0011 1dcaf249bd [doc] add reminder for issue encountered with hybrid adam
1 year ago
binmakeswell 822051d888
[doc] update slack link (#4823)
1 year ago
Yan haixu a22706337a
[misc] add last_epoch in CosineAnnealingWarmupLR (#4778)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
1 year ago
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671)
1 year ago
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603)
1 year ago
binmakeswell 089c365fa0
[doc] add Series A Funding and NeurIPS news (#4377)
1 year ago
Frank Lee 015af592f8 [shardformer] integrated linear 1D with dtensor (#3996)
1 year ago
FoolPlayer ab8a47f830 [shardformer] add Dropout layer support different dropout pattern (#3856)
1 year ago
FoolPlayer 8cc11235c0 [shardformer]: Feature/shardformer, add some docstring and readme (#3816)
1 year ago
github-actions[bot] a52f62082d
[format] applied code formatting on changed files in pull request 4021 (#4022)
1 year ago
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
1 year ago
FoolPlayer 21a3915c98 [shardformer] add Dropout layer support different dropout pattern (#3856)
1 year ago
FoolPlayer 58f6432416 [shardformer]: Feature/shardformer, add some docstring and readme (#3816)
1 year ago
digger yu 0e484e6201
[nfc]fix typo colossalai/pipeline tensor nn (#3899)
1 year ago
digger yu 1878749753
[nfc] fix typo colossalai/nn (#3887)
1 year ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882)
1 year ago
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779)
2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago
HELSON 1a1d68b053
[moe] add checkpoint for moe models (#3354)
2 years ago
Tong Li 196d4696d0 [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274)
2 years ago
Yuanchen d58fa705b2 [NFC] polish code style (#3268)
2 years ago
github-actions[bot] 82503a96f2
[format] applied code formatting on changed files in pull request 2997 (#3008)
2 years ago
binmakeswell 52a5078988
[doc] add ISC tutorial (#2997)
2 years ago
ver217 823f3b9cf4
[doc] add deepspeed citation and copyright (#2996)
2 years ago
zbian 61e687831d fixed using zero with tp cannot access weight correctly
2 years ago
Jiatong (Julius) Han 8c8a39be95
[hotfix]: Remove math.prod dependency (#2837)
2 years ago
junxu c52edcf0eb
Rename class method of ZeroDDP (#2692)
2 years ago
HELSON 56ddc9ca7a
[hotfix] add correct device for fake_param (#2796)
2 years ago
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671)
2 years ago