Frank Lee
|
80eba05b0a
|
[test] refactor tests with spawn (#3452)
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
|
2023-04-06 14:51:35 +08:00 |
ver217
|
26b7aac0be
|
[zero] reorganize zero/gemini folder structure (#3424)
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
|
2023-04-04 13:48:16 +08:00 |
HELSON
|
707b11d4a0
|
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init
* [gemini] update gpt example
|
2023-01-28 14:35:25 +08:00 |
HELSON
|
f69f9bf223
|
[zero] add chunk init function for users (#1729)
* add chunk manager init function
* fix unit tests
* add comment
* add flush=True
|
2022-10-18 16:31:22 +08:00 |
HELSON
|
b28991dd0a
|
[feature] A new ZeRO implementation (#1644)
|
2022-10-09 09:18:51 +08:00 |
Jiarui Fang
|
c5d39215f6
|
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405 .
|
2022-09-26 10:06:03 +08:00 |
HELSON
|
5be118f405
|
[feature] new zero implementation (#1623)
|
2022-09-24 19:58:18 +08:00 |
HELSON
|
4e98e938ce
|
[zero] alleviate memory usage in ZeRODDP state_dict (#1398)
|
2022-08-02 15:49:13 +08:00 |
ver217
|
7d5d628e07
|
[DDP] test ddp state dict uses more strict threshold (#1382)
|
2022-07-28 17:29:04 +08:00 |
HELSON
|
87775a0682
|
[colotensor] use cpu memory to store state_dict (#1367)
|
2022-07-26 14:13:38 +08:00 |
ver217
|
0c51ff2c13
|
[hotfix] ZeroDDP use new process group (#1333)
* process group supports getting ranks in group
* chunk mgr receives a process group
* update unit test
* fix unit tests
|
2022-07-18 14:14:52 +08:00 |
Jiarui Fang
|
3b500984b1
|
[tensor] fix some unittests (#1234)
|
2022-07-08 14:18:30 +08:00 |
Jiarui Fang
|
060b917daf
|
[refactor] remove gpc dependency in colotensor's _ops (#1189)
|
2022-07-04 18:54:37 +08:00 |
Jiarui Fang
|
372f791444
|
[refactor] move chunk and chunkmgr to directory gemini (#1182)
|
2022-06-29 13:31:02 +08:00 |
ver217
|
6b2f2ab9bb
|
[ddp] ColoDDP uses bucket all-reduce (#1177)
* add reducer
* update colo ddp with reducer
* polish unit test
* polish unit test
|
2022-06-29 10:34:13 +08:00 |
ver217
|
8106d7b8c7
|
[ddp] refactor ColoDDP and ZeroDDP (#1146)
* ColoDDP supports overwriting default process group
* rename ColoDDPV2 to ZeroDDP
* add docstr for ZeroDDP
* polish docstr
|
2022-06-21 16:35:23 +08:00 |
ver217
|
d26902645e
|
[ddp] add save/load state dict for ColoDDP (#1127)
* add save/load state dict for ColoDDP
* add unit test
* refactor unit test folder
* polish unit test
* rename unit test
|
2022-06-20 10:51:47 +08:00 |