Commit Graph

14 Commits (f28c0213769dbb2037cd08123ecbf1c8a3f5114b)

Author SHA1 Message Date
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities

* add unitest for GeminiMemoryManager
2022-04-24 13:08:48 +08:00
Jiarui Fang 595bedf767
revert zero tensors back (#829) 2022-04-22 12:12:35 +08:00
Jiarui Fang 294a6060d0
[tensor] ZeRO use ColoTensor as the base class. (#828)
* [refactor] moving InsertPostInitMethodToModuleSubClasses to utils.

* [tensor] ZeRO use ColoTensor as the base class.

* polish
2022-04-22 12:00:48 +08:00
HELSON 22c4b88d56
[zero] refactor ShardedParamV2 for convenience (#742) 2022-04-13 14:54:26 +08:00
Jiarui Fang f552b11294
[zero] label state for param fp16 and grad (#551) 2022-03-30 15:57:46 +08:00
Jiarui Fang 214da761d4
[zero] add stateful tensor (#549) 2022-03-30 13:51:37 +08:00
Jiarui Fang 8d8c5407c0
[zero] refactor model data tracing (#522) 2022-03-25 18:03:32 +08:00
Jiarui Fang b5f43acee3 [zero] find miss code (#378) 2022-03-11 15:50:28 +08:00
jiaruifang d9217e1960 Revert "[zero] bucketized tensor cpu gpu copy (#368)"
This reverts commit bef05489b6.
2022-03-11 15:50:28 +08:00
Jiarui Fang 00670c870e [zero] bucketized tensor cpu gpu copy (#368) 2022-03-11 15:50:28 +08:00
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
* place params on cpu after zero init context

* polish code
2022-03-11 15:50:28 +08:00
Jiarui Fang ea2872073f [zero] global model data memory tracer (#360) 2022-03-11 15:50:28 +08:00
Jiarui Fang c9e7d9582d [zero] polish shard strategy (#310)
* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code

* add shard stratgy

* move shard and gather logic to shard strategy from shard tensor.

* polish code
2022-03-11 15:50:28 +08:00
Jiarui Fang 80364c7686 [zero] sharded tensor (#305)
* init shard param from shape tuple

* add more unitest for shard param

* add set_payload method for ShardedParam

* [zero] add shareded tensor class

* polish code
2022-03-11 15:50:28 +08:00