HELSON
|
4e98e938ce
|
[zero] alleviate memory usage in ZeRODDP state_dict (#1398)
|
2022-08-02 15:49:13 +08:00 |
ver217
|
7d5d628e07
|
[DDP] test ddp state dict uses more strict threshold (#1382)
|
2022-07-28 17:29:04 +08:00 |
HELSON
|
87775a0682
|
[colotensor] use cpu memory to store state_dict (#1367)
|
2022-07-26 14:13:38 +08:00 |
ver217
|
0c51ff2c13
|
[hotfix] ZeroDDP use new process group (#1333)
* process group supports getting ranks in group
* chunk mgr receives a process group
* update unit test
* fix unit tests
|
2022-07-18 14:14:52 +08:00 |
Jiarui Fang
|
3b500984b1
|
[tensor] fix some unittests (#1234)
|
2022-07-08 14:18:30 +08:00 |
Jiarui Fang
|
060b917daf
|
[refactor] remove gpc dependency in colotensor's _ops (#1189)
|
2022-07-04 18:54:37 +08:00 |
Jiarui Fang
|
372f791444
|
[refactor] move chunk and chunkmgr to directory gemini (#1182)
|
2022-06-29 13:31:02 +08:00 |
ver217
|
6b2f2ab9bb
|
[ddp] ColoDDP uses bucket all-reduce (#1177)
* add reducer
* update colo ddp with reducer
* polish unit test
* polish unit test
|
2022-06-29 10:34:13 +08:00 |
ver217
|
8106d7b8c7
|
[ddp] refactor ColoDDP and ZeroDDP (#1146)
* ColoDDP supports overwriting default process group
* rename ColoDDPV2 to ZeroDDP
* add docstr for ZeroDDP
* polish docstr
|
2022-06-21 16:35:23 +08:00 |
ver217
|
d26902645e
|
[ddp] add save/load state dict for ColoDDP (#1127)
* add save/load state dict for ColoDDP
* add unit test
* refactor unit test folder
* polish unit test
* rename unit test
|
2022-06-20 10:51:47 +08:00 |