ver217
04c9a86af8
[zero] ZeroDDP supports controlling outputs' dtype ( #1399 )
2 years ago
HELSON
4e98e938ce
[zero] alleviate memory usage in ZeRODDP state_dict ( #1398 )
2 years ago
ver217
83328329dd
[hotfix] fix zero ddp buffer cast ( #1376 )
...
* fix zero ddp buffer cast
* fix zero ddp ignore params
2 years ago
ver217
5d5031e946
fix zero ddp state dict ( #1378 )
2 years ago
HELSON
87775a0682
[colotensor] use cpu memory to store state_dict ( #1367 )
2 years ago
ver217
d068af81a3
[doc] update rst and docstring ( #1351 )
...
* update rst
* add zero docstr
* fix docstr
* remove fx.tracer.meta_patch
* fix docstr
* fix docstr
* update fx rst
* fix fx docstr
* remove useless rst
2 years ago
ver217
0c51ff2c13
[hotfix] ZeroDDP use new process group ( #1333 )
...
* process group supports getting ranks in group
* chunk mgr receives a process group
* update unit test
* fix unit tests
2 years ago
Jiarui Fang
b5f25eb32a
[Tensor] add cpu group to ddp ( #1200 )
2 years ago
Jiarui Fang
060b917daf
[refactor] remove gpc dependency in colotensor's _ops ( #1189 )
2 years ago
Jiarui Fang
372f791444
[refactor] move chunk and chunkmgr to directory gemini ( #1182 )
2 years ago
ver217
6b2f2ab9bb
[ddp] ColoDDP uses bucket all-reduce ( #1177 )
...
* add reducer
* update colo ddp with reducer
* polish unit test
* polish unit test
2 years ago
ver217
54aabb8da4
[gemini] refactor gemini mgr ( #1151 )
...
* refactor gemini mgr
* udpate __init__
2 years ago
ver217
8106d7b8c7
[ddp] refactor ColoDDP and ZeroDDP ( #1146 )
...
* ColoDDP supports overwriting default process group
* rename ColoDDPV2 to ZeroDDP
* add docstr for ZeroDDP
* polish docstr
2 years ago
Frank Lee
15aab1476e
[zero] avoid zero hook spam by changing log to debug level ( #1137 )
2 years ago
ver217
d26902645e
[ddp] add save/load state dict for ColoDDP ( #1127 )
...
* add save/load state dict for ColoDDP
* add unit test
* refactor unit test folder
* polish unit test
* rename unit test
2 years ago
ver217
f0a954f16d
[ddp] add set_params_to_ignore for ColoDDP ( #1122 )
...
* add set_params_to_ignore for ColoDDP
* polish code
* fix zero hook v2
* add unit test
* polish docstr
2 years ago
ver217
e127b4375b
cast colo ddp v2 inputs/outputs ( #1120 )
2 years ago
ver217
7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy ( #1118 )
...
* update gemini mgr
* update chunk
* add docstr
* polish placement policy
* update test chunk
* update test zero
* polish unit test
* remove useless unit test
2 years ago
ver217
895c1c5ee7
[tensor] refactor param op hook ( #1097 )
...
* refactor param op hook
* add docstr
* fix bug
3 years ago
Frank Lee
cb18922c47
[doc] added documentation to chunk and chunk manager ( #1094 )
...
* [doc] added documentation to chunk and chunk manager
* polish code
* polish code
* polish code
3 years ago
ver217
1f894e033f
[gemini] zero supports gemini ( #1093 )
...
* add placement policy
* add gemini mgr
* update mem stats collector
* update zero
* update zero optim
* fix bugs
* zero optim monitor os
* polish unit test
* polish unit test
* add assert
3 years ago
ver217
be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 ( #1077 )
...
* polish chunk manager
* polish unit test
* impl add_extern_static_tensor for chunk mgr
* add mem stats collector v2
* polish code
* polish unit test
* polish code
* polish get chunks
3 years ago
Ziyue Jiang
4fc748f69b
[Tensor] fix optimizer for CPU parallel ( #1069 )
3 years ago
Jiarui Fang
49832b2344
[refactory] add nn.parallel module ( #1068 )
3 years ago