Commit Graph

1395 Commits (feature/elixir)

Author SHA1 Message Date
Frank Lee c173a69b3e
[elixir] refactored the chunk module (#3956)
1 year ago
Frank Lee 86ff5c152b
[elixir] moved simulator build to op_builder (#3939)
1 year ago
Frank Lee 3b58ff5c73
[elixir] updated readme (#3944)
1 year ago
Haichen Huang 1ee247a51c
update buffer size calculation (#3871)
2 years ago
Haichen Huang dbb9659099
[elixir] add elixir plugin and its unit test (#3865)
2 years ago
Haichen Huang 206280408a
[elixir] add elixir and its unit tests (#3835)
2 years ago
digger yu 7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808)
2 years ago
wukong1992 6b305a99d6
[booster] torch fsdp fix ckpt (#3788)
2 years ago
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779)
2 years ago
jiangmingyan e871e342b3
[API] add docstrings and initialization to apex amp, naive amp (#3783)
2 years ago
Frank Lee f5c425c898
fixed the example docstring for booster (#3795)
2 years ago
Hongxin Liu 72688adb2f
[doc] add booster docstring and fix autodoc (#3789)
2 years ago
Hongxin Liu 3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint (#3780)
2 years ago
Hongxin Liu 60e6a154bc
[doc] add tutorial for booster checkpoint (#3785)
2 years ago
digger yu 32f81f14d4
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756)
2 years ago
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775)
2 years ago
jiangmingyan 2703a37ac9
[amp] Add naive amp demo (#3774)
2 years ago
digger yu 1baeb39c72
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742)
2 years ago
wukong1992 b37797ed3d
[booster] support torch fsdp plugin in booster (#3697)
2 years ago
digger-yu ad6460cf2c
[NFC] fix typo applications/ and colossalai/ (#3735)
2 years ago
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707)
2 years ago
jiangmingyan 20068ba188
[booster] add tests for ddp and low level zero's checkpointio (#3715)
2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709)
2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706)
2 years ago
Hongxin Liu f83ea813f5
[example] add train resnet/vit with booster example (#3694)
2 years ago
YH 2629f9717d
[tensor] Refactor handle_trans_spec in DistSpecManager
2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684)
2 years ago
jiangmingyan 307894f74d
[booster] gemini plugin support shard checkpoint (#3610)
2 years ago
YH a22407cc02
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173)
2 years ago
Hongxin Liu 50793b35f4
[gemini] accelerate inference (#3641)
2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594)
2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
2 years ago
Hongxin Liu 12eff9eb4c
[gemini] state dict supports fp16 (#3590)
2 years ago
Hongxin Liu dac127d0ee
[fx] fix meta tensor registration (#3589)
2 years ago
Hongxin Liu f313babd11
[gemini] support save state dict in shards (#3581)
2 years ago
YH d329c294ec
Add docstr for zero3 chunk search utils (#3572)
2 years ago
Hongxin Liu 173dad0562
[misc] add verbose arg for zero and op builder (#3552)
2 years ago
Hongxin Liu 4341f5e8e6
[lazyinit] fix clone and deepcopy (#3553)
2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379)
2 years ago
jiangmingyan 366a035552
[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479)
2 years ago
YH bcf0cbcbe7
[doc] Add docs for clip args in zero optim (#3504)
2 years ago
jiangmingyan 52a933e175
[checkpoint] support huggingface style sharded checkpoint (#3461)
2 years ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
2 years ago
Frank Lee 7d8d825681
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442)
2 years ago
YH 8f740deb53
Fix typo (#3448)
2 years ago
Hakjin Lee 46c009dba4
[format] Run lint on colossalai.engine (#3367)
2 years ago
YuliangLiu0306 ffcdbf0f65
[autoparallel]integrate auto parallel feature with new tracer (#3408)
2 years ago
ver217 573af84184
[example] update examples related to zero/gemini (#3431)
2 years ago
Frank Lee 1beb85cc25
[checkpoint] refactored the API and added safetensors support (#3427)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago