Commit Graph

11 Commits (7080a8edb08400d97ba4c31458f532e4ceeacf4b)

Author SHA1 Message Date
Jiarui Fang af32022f74
[Gemini] fix the convert_to_torch_module bug (#2269) 2023-01-03 15:55:35 +08:00
Jiarui Fang c4739a725a
[Gemini] polish memstats collector (#1962) 2022-11-16 15:45:57 +08:00
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
ver217 dba7e0cfb4
make AutoPlacementPolicy configurable (#1191) 2022-06-30 15:18:30 +08:00
Jiarui Fang 372f791444
[refactor] move chunk and chunkmgr to directory gemini (#1182) 2022-06-29 13:31:02 +08:00
ver217 54aabb8da4
[gemini] refactor gemini mgr (#1151)
* refactor gemini mgr

* udpate __init__
2022-06-22 11:54:36 +08:00
ver217 7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy (#1118)
* update gemini mgr

* update chunk

* add docstr

* polish placement policy

* update test chunk

* update test zero

* polish unit test

* remove useless unit test
2022-06-15 15:05:19 +08:00
ver217 1f894e033f
[gemini] zero supports gemini (#1093)
* add placement policy

* add gemini mgr

* update mem stats collector

* update zero

* update zero optim

* fix bugs

* zero optim monitor os

* polish unit test

* polish unit test

* add assert
2022-06-10 14:48:28 +08:00