955 Commits (39f2582e987871c198f2f2526cd4435cbd569741)

Author SHA1 Message Date
Frank Lee c4b1b65931 [test] fixed tests failed due to dtensor change (#4082) 1 year ago
FoolPlayer 92f6791095 [shardformer] Add layernorm (#4072) 1 year ago
Frank Lee 70c58cfd4f [shardformer] supported fused qkv checkpoint (#4073) 1 year ago
FoolPlayer 0803a61412 [shardformer] add linearconv1d test (#4067) 1 year ago
Frank Lee 8eb09a4c69 [shardformer] support module saving and loading (#4062) 1 year ago
FoolPlayer 7740c55c55 support kit use for bert/gpt test (#4055) 1 year ago
Frank Lee f22ddacef0 [shardformer] refactored the shardformer layer structure (#4053) 1 year ago
Frank Lee 58df720570 [shardformer] adapted T5 and LLaMa test to use kit (#4049) 1 year ago
FoolPlayer 4021b9a8a2 [shardformer] add gpt2 test and layer class refactor (#4041) 1 year ago
Frank Lee d857f3dbba [shardformer] supported T5 and its variants (#4045) 1 year ago
Frank Lee c1d5453e9f [shardformer] adapted llama to the new API (#4036) 1 year ago
FoolPlayer 74d176c8d8 [shardformer] fix bert and gpt downstream with new api (#4024) 1 year ago
FoolPlayer 507c0ad368 add vocabembedding layer 1 year ago
Frank Lee 3893fa1a8d [shardformer] refactored embedding and dropout to parallel module (#4013) 1 year ago
FoolPlayer dfca9678fa integrate with dist layer (#4011) 1 year ago
Frank Lee 015af592f8 [shardformer] integrated linear 1D with dtensor (#3996) 1 year ago
Frank Lee 611971248c [device] support init device mesh from process group (#3990) 1 year ago
FoolPlayer f7774ec0f3 [Shardformer] Downstream bert (#3979) 1 year ago
wukong1992 c1c672d0f0 [shardformer] shardformer support t5 model (#3994) 1 year ago
wukong1992 6b30dfb7ce [shardformer] support llama model using shardformer (#3969) 1 year ago
FoolPlayer a73130482d [shardformer] Unit test (#3928) 1 year ago
FoolPlayer f1cb5ac6bf [shardformer] Align bert value (#3907) 1 year ago
Baizhou Zhang 0bb0b481b4 [gemini] fix argument naming during chunk configuration searching 1 year ago
github-actions[bot] a52f62082d
[format] applied code formatting on changed files in pull request 4021 (#4022) 1 year ago
Frank Lee a5883aa790
[test] fixed codefactor format report (#4026) 1 year ago
Baizhou Zhang 822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) 1 year ago
Wenhao Chen 725af3eeeb
[booster] make optimizer argument optional for boost (#3993) 1 year ago
Baizhou Zhang c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) 1 year ago
digger yu e61ffc77c6
fix typo tests/ (#3936) 1 year ago
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop" 1 year ago
Frank Lee eb39154d40
[dtensor] updated api and doc (#3845) 1 year ago
Hongxin Liu ae02d4e4f7
[bf16] add bf16 support (#3882) 1 year ago
Hongxin Liu dbb32692d2
[lazy] refactor lazy init (#3891) 1 year ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788) 2 years ago
Frank Lee 615e2e5fc1
[test] fixed lazy init test import error (#3799) 2 years ago
Hongxin Liu 3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint (#3780) 2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775) 2 years ago
wukong1992 6050f37776
[booster] removed models that don't support fsdp (#3744) 2 years ago
Hongxin Liu afb239bbf8
[devops] update torch version of CI (#3725) 2 years ago
wukong1992 b37797ed3d
[booster] support torch fsdp plugin in booster (#3697) 2 years ago
digger-yu 1f73609adb
[CI] fix typo with tests/ etc. (#3727) 2 years ago
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707) 2 years ago
jiangmingyan 20068ba188
[booster] add tests for ddp and low level zero's checkpointio (#3715) 2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709) 2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706) 2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684) 2 years ago
digger-yu b49020c1b1
[CI] Update test_sharded_optim_with_sync_bn.py (#3688) 2 years ago
jiangmingyan 307894f74d
[booster] gemini plugin support shard checkpoint (#3610) 2 years ago
Hongxin Liu 50793b35f4
[gemini] accelerate inference (#3641) 2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594) 2 years ago