Commit Graph

18 Commits (7d49e7b2dbdb4b966496475654a4154b92aeaa7b)

Author SHA1 Message Date
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities

* add unitest for GeminiMemoryManager
2022-04-24 13:08:48 +08:00
HELSON a65cbb7e4e
[zero] refactor shard and gather operation (#773) 2022-04-15 14:41:31 +08:00
Jiarui Fang 4d90a7b513
[refactor] zero directory (#724) 2022-04-11 23:13:02 +08:00
Jiarui Fang 193dc8dacb
[refactor] refactor the memory utils (#715) 2022-04-11 16:47:57 +08:00
Jiarui Fang e956d93ac2
[refactor] memory utils (#577) 2022-04-01 09:22:33 +08:00
Jiarui Fang 53b1b6e340
[zero] non model data tracing (#545) 2022-03-29 15:45:48 +08:00
Jiarui Fang 705f56107c
[zero] refactor model data tracing (#537) 2022-03-28 16:38:18 +08:00
Jiarui Fang 8d8c5407c0
[zero] refactor model data tracing (#522) 2022-03-25 18:03:32 +08:00
Jiarui Fang 4d322b79da
[refactor] remove old zero code (#517) 2022-03-25 14:54:39 +08:00
Jiarui Fang 0bebda6ea5
[zero] fix init device bug in zero init context unittest (#516) 2022-03-25 12:24:18 +08:00
ver217 fc8e6db005
[doc] Update docstring for ZeRO (#459)
* polish sharded model docstr

* polish sharded optim docstr

* polish zero docstr

* polish shard strategy docstr
2022-03-18 16:48:20 +08:00
ver217 a241f61b34
[zero] Update initialize for ZeRO (#458)
* polish code

* shard strategy receive pg in shard() / gather()

* update zero engine

* polish code
2022-03-18 16:18:31 +08:00
Jiarui Fang 0fcfb1e00d
[test] make zero engine test really work (#447) 2022-03-17 17:24:25 +08:00
HELSON 7c079d9c33
[hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394) 2022-03-11 18:12:46 +08:00
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
* place params on cpu after zero init context

* polish code
2022-03-11 15:50:28 +08:00
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00
Jiarui Fang c9e7d9582d [zero] polish shard strategy (#310)
* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code

* add shard stratgy

* move shard and gather logic to shard strategy from shard tensor.

* polish code
2022-03-11 15:50:28 +08:00
Jiarui Fang 74f77e314b [zero] a shard strategy in granularity of tensor (#307) 2022-03-11 15:50:28 +08:00