ver217
821c6172e2
[utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer ( #1442 )
2022-08-11 22:58:58 +08:00
ver217
6df3e19be9
[hotfix] zero optim prevents calling inner optim.zero_grad ( #1422 )
2022-08-09 16:08:12 +08:00
ver217
8dced41ad0
[zero] zero optim state_dict takes only_rank_0 ( #1384 )
...
* zero optim state_dict takes only_rank_0
* fix unit test
2022-07-29 13:22:50 +08:00
ver217
828b9e5e0d
[hotfix] fix zero optim save/load state dict ( #1381 )
2022-07-28 17:19:39 +08:00
ver217
6b43c789fd
fix zero optim backward_by_grad and save/load ( #1353 )
2022-07-21 16:43:58 +08:00
ver217
d068af81a3
[doc] update rst and docstring ( #1351 )
...
* update rst
* add zero docstr
* fix docstr
* remove fx.tracer.meta_patch
* fix docstr
* fix docstr
* update fx rst
* fix fx docstr
* remove useless rst
2022-07-21 15:54:53 +08:00
ver217
561e90493f
[zero] zero optim supports loading local state dict ( #1171 )
...
* zero optim supports loading local state dict
* polish code
* add unit test
2022-06-24 17:25:57 +08:00
ver217
8106d7b8c7
[ddp] refactor ColoDDP and ZeroDDP ( #1146 )
...
* ColoDDP supports overwriting default process group
* rename ColoDDPV2 to ZeroDDP
* add docstr for ZeroDDP
* polish docstr
2022-06-21 16:35:23 +08:00
Frank Lee
14e5b11d7f
[zero] fixed api consistency ( #1098 )
2022-06-10 16:59:59 +08:00
Frank Lee
cb18922c47
[doc] added documentation to chunk and chunk manager ( #1094 )
...
* [doc] added documentation to chunk and chunk manager
* polish code
* polish code
* polish code
2022-06-10 15:33:06 +08:00
ver217
1f894e033f
[gemini] zero supports gemini ( #1093 )
...
* add placement policy
* add gemini mgr
* update mem stats collector
* update zero
* update zero optim
* fix bugs
* zero optim monitor os
* polish unit test
* polish unit test
* add assert
2022-06-10 14:48:28 +08:00
ver217
be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 ( #1077 )
...
* polish chunk manager
* polish unit test
* impl add_extern_static_tensor for chunk mgr
* add mem stats collector v2
* polish code
* polish unit test
* polish code
* polish get chunks
2022-06-09 20:56:34 +08:00
ver217
c5cd3b0f35
[zero] zero optim copy chunk rather than copy tensor ( #1070 )
2022-06-07 10:30:46 +08:00
Jiarui Fang
49832b2344
[refactory] add nn.parallel module ( #1068 )
2022-06-06 15:34:41 +08:00
ver217
51b9a49655
[zero] add zero optimizer for ColoTensor ( #1046 )
...
* add zero optimizer
* torch ok
* unit test ok
* polish code
* fix bugs
* polish unit test
* polish zero optim
* polish colo ddp v2
* refactor folder structure
* add comment
* polish unit test
* polish zero optim
* polish unit test
2022-06-02 12:13:15 +08:00