ColossalAI/colossalai
Hongxin Liu 5452df63c5
[plugin] torch ddp plugin supports sharded model checkpoint (#3775)
* [plugin] torch ddp plugin add save sharded model

* [test] fix torch ddp ckpt io test

* [test] fix torch ddp ckpt io test

* [test] fix low level zero plugin test

* [test] fix low level zero plugin test

* [test] add debug info

* [test] add debug info

* [test] add debug info

* [test] add debug info

* [test] add debug info

* [test] fix low level zero plugin test

* [test] fix low level zero plugin test

* [test] remove debug info
2023-05-18 20:05:59 +08:00
..
_C [setup] support pre-build and jit-build of cuda kernels (#2374) 2023-01-06 20:50:26 +08:00
_analyzer [example] add train resnet/vit with booster example (#3694) 2023-05-08 10:42:30 +08:00
amp [NFC] polish colossalai/amp/__init__.py code style (#3272) 2023-03-29 15:22:21 +08:00
auto_parallel [NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) 2023-05-17 11:13:23 +08:00
autochunk [NFC] fix typo applications/ and colossalai/ (#3735) 2023-05-15 11:46:25 +08:00
booster [plugin] torch ddp plugin supports sharded model checkpoint (#3775) 2023-05-18 20:05:59 +08:00
builder [NFC] polish colossalai/builder/__init__.py code style (#1560) 2022-09-08 22:11:04 +08:00
checkpoint_io [plugin] torch ddp plugin supports sharded model checkpoint (#3775) 2023-05-18 20:05:59 +08:00
cli [NFC] fix typo applications/ and colossalai/ (#3735) 2023-05-15 11:46:25 +08:00
cluster [booster] implemented the torch ddd + resnet example (#3232) 2023-03-27 10:24:14 +08:00
communication [CI] fix some spelling errors (#3707) 2023-05-10 17:12:03 +08:00
context [CI] fix some spelling errors (#3707) 2023-05-10 17:12:03 +08:00
device [hotfix] add copyright for solver and device mesh (#2803) 2023-02-18 21:14:38 +08:00
engine [format] Run lint on colossalai.engine (#3367) 2023-04-05 23:24:43 +08:00
fx [doc] Fix typo under colossalai and doc(#3618) 2023-04-26 11:38:43 +08:00
interface [booster] implemented the torch ddd + resnet example (#3232) 2023-03-27 10:24:14 +08:00
kernel [doc] Fix typo under colossalai and doc(#3618) 2023-04-26 11:38:43 +08:00
logging [logger] hotfix, missing _FORMAT (#2231) 2022-12-29 22:59:39 +08:00
nn [doc] Fix typo under colossalai and doc(#3618) 2023-04-26 11:38:43 +08:00
pipeline [pipeline] Add Simplified Alpa DP Partition (#2507) 2023-03-07 10:34:31 +08:00
registry Remove duplication registry (#1078) 2022-06-08 07:47:24 +08:00
tensor [tensor] Refactor handle_trans_spec in DistSpecManager 2023-05-06 17:55:37 +08:00
testing [NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) 2023-05-17 11:13:23 +08:00
trainer [polish] remove useless file _mem_tracer_hook.py (#1963) 2022-11-16 15:55:10 +08:00
utils [doc] Fix typo under colossalai and doc(#3618) 2023-04-26 11:38:43 +08:00
zero [booster] gemini plugin support shard checkpoint (#3610) 2023-05-05 14:37:21 +08:00
__init__.py [setup] supported conda-installed torch (#2048) 2022-11-30 16:45:15 +08:00
constants.py updated tp layers 2022-11-02 12:19:38 +08:00
core.py [Tensor] distributed view supports inter-process hybrid parallel (#1169) 2022-06-27 09:45:26 +08:00
global_variables.py [NFC] polish colossalai/global_variables.py code style (#3259) 2023-03-29 15:22:21 +08:00
initialize.py [zero] reorganize zero/gemini folder structure (#3424) 2023-04-04 13:48:16 +08:00