Commit Graph

16 Commits (08815f0e72b7ff4ef67d73f813c94bb09c92cec4)

Author SHA1 Message Date
HELSON 527758b2ae
[hotfix] fix a running error in test_colo_checkpoint.py (#1387) 2022-07-29 15:58:06 +08:00
HELSON b6fd165f66
[checkpoint] add kwargs for load_state_dict (#1374) 2022-07-28 15:56:52 +08:00
Frank Lee 0c1a16ea5b
[util] standard checkpoint function naming (#1377) 2022-07-28 09:29:30 +08:00
Super Daniel be229217ce
[fx] add torchaudio test (#1369)
* [fx]add torchaudio test

* [fx]add torchaudio test

* [fx] add torchaudio test

* [fx] add torchaudio test

* [fx] add torchaudio test

* [fx] add torchaudio test

* [fx] add torchaudio test

* [fx] add torchaudio test and test patches

* Delete ~

* [fx] add patches and patches test

* [fx] add patches and patches test

* [fx] fix patches

* [fx] fix rnn patches

* [fx] fix rnn patches

* [fx] fix rnn patches

* [fx] fix rnn patches

* [fx] merge upstream

* [fx] fix import errors
2022-07-27 11:03:14 +08:00
HELSON 8463290642
[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) 2022-07-26 14:41:53 +08:00
HELSON 87775a0682
[colotensor] use cpu memory to store state_dict (#1367) 2022-07-26 14:13:38 +08:00
HELSON 943a96323e
[hotfix] fix no optimizer in save/load (#1363) 2022-07-26 10:53:53 +08:00
HELSON 7a8702c06d
[colotensor] add Tensor.view op and its unit test (#1343)
[colotensor] add megatron initialization for gpt2
2022-07-21 10:53:15 +08:00
HELSON f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) 2022-07-19 14:15:28 +08:00
Jiarui Fang 9e4c6449b0
[checkpoint] add ColoOptimizer checkpointing (#1316) 2022-07-15 09:52:55 +08:00
Jiarui Fang 3ef3791a3b
[checkpoint] add test for bert and hotfix save bugs (#1297) 2022-07-14 15:38:18 +08:00
Jiarui Fang c92f84fcdb
[tensor] distributed checkpointing for parameters (#1240) 2022-07-12 15:51:06 +08:00
Jiarui Fang 20da6e48c8
[checkpoint] save sharded optimizer states (#1237) 2022-07-08 16:33:13 +08:00
Yi Zhao 04537bf83e
[checkpoint]support generalized scheduler (#1222) 2022-07-07 18:16:38 +08:00
Jiarui Fang 52736205d9
[checkpoint] make unitest faster (#1217) 2022-07-06 17:39:46 +08:00
Jiarui Fang f38006ea83
[checkpoint] checkpoint for ColoTensor Model (#1196) 2022-07-06 17:22:03 +08:00