Gao, Ruiyuan
e9032fb0b2
[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg ( #6020 )
...
* fix bug in load_state_dict_into_model; format error msg
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update utils.py
to support checking missing_keys
* Update general_checkpoint_io.py
fix bug in missing_keys error message
* retrigger tests
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-02 16:56:35 +08:00
Baizhou Zhang
14b0d4c7e5
[lora] add lora APIs for booster, support lora for TorchDDP ( #4981 )
...
* add apis and peft requirement
* add liscense and implement apis
* add checkpointio apis
* add torchddp fwd_bwd test
* add support_lora methods
* add checkpointio test and debug
* delete unneeded codes
* remove peft from LICENSE
* add concrete methods for enable_lora
* simplify enable_lora api
* fix requirements
2024-04-28 10:51:27 +08:00
Baizhou Zhang
c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic ( #4758 )
...
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
2023-09-20 18:29:37 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Hongxin Liu
a39a5c66fe
Merge branch 'main' into feature/shardformer
2023-09-04 23:43:13 +08:00
Baizhou Zhang
e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins ( #4606 )
2023-09-04 23:25:01 +08:00
Hongxin Liu
63ecafb1fb
[checkpointio] optimize zero optim checkpoint io ( #4591 )
...
* [zero] update checkpoint io to save memory
* [checkpointio] add device map to save memory
2023-09-04 11:26:45 +08:00
Baizhou Zhang
c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin ( #4302 )
...
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
2023-07-21 14:39:01 +08:00
Baizhou Zhang
58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin ( #4141 )
...
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin
* [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
2023-07-07 16:33:06 +08:00
Baizhou Zhang
822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin ( #4002 )
2023-06-16 14:14:05 +08:00
Baizhou Zhang
c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers ( #3984 )
2023-06-15 15:21:26 +08:00
wukong1992
6b305a99d6
[booster] torch fsdp fix ckpt ( #3788 )
2023-05-23 16:58:45 +08:00
jiangmingyan
307894f74d
[booster] gemini plugin support shard checkpoint ( #3610 )
...
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin add shard checkpoint save/load
* gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
* [API Refactoring]gemini plugin support shard checkpoint
---------
Co-authored-by: luchen <luchen@luchendeMBP.lan>
Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>
2023-05-05 14:37:21 +08:00
jiangmingyan
366a035552
[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files ( #3479 )
...
* [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format
* [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format
* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename
* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename
* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename
* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename
---------
Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>
Co-authored-by: luchen <luchen@luchendeMBP.lan>
2023-04-12 16:02:17 +08:00
jiangmingyan
52a933e175
[checkpoint] support huggingface style sharded checkpoint ( #3461 )
...
* [checkpoint] support huggingface style sharded checkpoint
* [checkpoint] support huggingface style sharded checkpoint
* [checkpoint] support huggingface style sharded checkpoint
* [checkpoint] support huggingface style sharded checkpoint
* [checkpoint] support huggingface style sharded checkpoint
---------
Co-authored-by: luchen <luchen@luchendeMBP.lan>
2023-04-06 16:23:39 +08:00
Frank Lee
1beb85cc25
[checkpoint] refactored the API and added safetensors support ( #3427 )
...
* [checkpoint] refactored the API and added safetensors support
* polish code
2023-04-04 15:23:01 +08:00
Frank Lee
73d3e4d309
[booster] implemented the torch ddd + resnet example ( #3232 )
...
* [booster] implemented the torch ddd + resnet example
* polish code
2023-03-27 10:24:14 +08:00
Frank Lee
cd142fbefa
[api] implemented the checkpoint io module ( #3205 )
...
* [api] implemented the checkpoint io module
* polish code
* polish code
2023-03-23 10:53:17 +08:00