Commit Graph

33 Commits (a4489384d5239074c541820ef583fb5e8371b9ae)

Author SHA1 Message Date
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
digger yu 9c2feb2f0b
fix some typo with colossalai/device colossalai/tensor/ etc. (#4171)
1 year ago
flybird11111 eedaa3e1ef
[shardformer]fix gpt2 double head (#4663)
1 year ago
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560)
1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
1 year ago
flybird1111 906426cb44 [Shardformer] Merge flash attention branch to pipeline branch (#4362)
1 year ago
FoolPlayer c3ca53cf05 [test] skip some not compatible models
1 year ago
Hongxin Liu 411cf1d2db [hotfix] fix gemini and zero test (#4333)
1 year ago
Hongxin Liu 261eab02fb [plugin] add 3d parallel plugin (#4295)
1 year ago
LuGY c6ab96983a [zero] refactor low level zero for shard evenly (#4030)
1 year ago
Frank Lee 58df720570 [shardformer] adapted T5 and LLaMa test to use kit (#4049)
1 year ago
Hongxin Liu dbb32692d2
[lazy] refactor lazy init (#3891)
1 year ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788)
2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775)
2 years ago
wukong1992 6050f37776
[booster] removed models that don't support fsdp (#3744)
2 years ago
Hongxin Liu afb239bbf8
[devops] update torch version of CI (#3725)
2 years ago
wukong1992 b37797ed3d
[booster] support torch fsdp plugin in booster (#3697)
2 years ago
digger-yu 1f73609adb
[CI] fix typo with tests/ etc. (#3727)
2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709)
2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706)
2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684)
2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594)
2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379)
2 years ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
2 years ago
Frank Lee 1beb85cc25
[checkpoint] refactored the API and added safetensors support (#3427)
2 years ago
Frank Lee 638a07a7f9
[test] fixed gemini plugin test (#3411)
2 years ago
ver217 5f2e34e6c9
[booster] implement Gemini plugin (#3352)
2 years ago
Frank Lee 73d3e4d309
[booster] implemented the torch ddd + resnet example (#3232)
2 years ago
Frank Lee e7f3bed2d3
[booster] added the plugin base and torch ddp plugin (#3180)
2 years ago
Frank Lee a9b8402d93
[booster] added the accelerator implementation (#3159)
2 years ago
Frank Lee 1ad3a636b1
[test] fixed torchrec model test (#3167)
2 years ago
Frank Lee ed19290560
[booster] implemented mixed precision class (#3151)
2 years ago