Frank Lee
|
80eba05b0a
|
[test] refactor tests with spawn (#3452)
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
|
2 years ago |
ver217
|
26b7aac0be
|
[zero] reorganize zero/gemini folder structure (#3424)
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
|
2 years ago |
HELSON
|
707b11d4a0
|
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init
* [gemini] update gpt example
|
2 years ago |
HELSON
|
f69f9bf223
|
[zero] add chunk init function for users (#1729)
* add chunk manager init function
* fix unit tests
* add comment
* add flush=True
|
2 years ago |
HELSON
|
b28991dd0a
|
[feature] A new ZeRO implementation (#1644)
|
2 years ago |
Jiarui Fang
|
c5d39215f6
|
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405 .
|
2 years ago |
HELSON
|
5be118f405
|
[feature] new zero implementation (#1623)
|
2 years ago |
HELSON
|
4e98e938ce
|
[zero] alleviate memory usage in ZeRODDP state_dict (#1398)
|
2 years ago |
ver217
|
7d5d628e07
|
[DDP] test ddp state dict uses more strict threshold (#1382)
|
2 years ago |
HELSON
|
87775a0682
|
[colotensor] use cpu memory to store state_dict (#1367)
|
2 years ago |
ver217
|
0c51ff2c13
|
[hotfix] ZeroDDP use new process group (#1333)
* process group supports getting ranks in group
* chunk mgr receives a process group
* update unit test
* fix unit tests
|
2 years ago |
Jiarui Fang
|
3b500984b1
|
[tensor] fix some unittests (#1234)
|
2 years ago |
Jiarui Fang
|
060b917daf
|
[refactor] remove gpc dependency in colotensor's _ops (#1189)
|
2 years ago |
Jiarui Fang
|
372f791444
|
[refactor] move chunk and chunkmgr to directory gemini (#1182)
|
2 years ago |
ver217
|
6b2f2ab9bb
|
[ddp] ColoDDP uses bucket all-reduce (#1177)
* add reducer
* update colo ddp with reducer
* polish unit test
* polish unit test
|
2 years ago |
ver217
|
8106d7b8c7
|
[ddp] refactor ColoDDP and ZeroDDP (#1146)
* ColoDDP supports overwriting default process group
* rename ColoDDPV2 to ZeroDDP
* add docstr for ZeroDDP
* polish docstr
|
2 years ago |
ver217
|
d26902645e
|
[ddp] add save/load state dict for ColoDDP (#1127)
* add save/load state dict for ColoDDP
* add unit test
* refactor unit test folder
* polish unit test
* rename unit test
|
2 years ago |