Commit Graph

15 Commits (49832b23441ca66bbb6064f6e9df45777e72564f)

Author SHA1 Message Date
ver217 51b9a49655
[zero] add zero optimizer for ColoTensor (#1046)
* add zero optimizer

* torch ok

* unit test ok

* polish code

* fix bugs

* polish unit test

* polish zero optim

* polish colo ddp v2

* refactor folder structure

* add comment

* polish unit test

* polish zero optim

* polish unit test
2022-06-02 12:13:15 +08:00
ver217 9492a561c3
[tensor] ColoTensor supports ZeRo (#1015)
* impl chunk manager

* impl param op hook

* add reduce_chunk

* add zero hook v2

* add zero dp

* fix TensorInfo

* impl load balancing when using zero without chunk

* fix zero hook

* polish chunk

* fix bugs

* ddp ok

* zero ok

* polish code

* fix bugs about load balancing

* polish code

* polish code

* add ene-to-end test

* polish code

* polish code

* polish code

* fix typo

* add test_chunk

* fix bugs

* fix bugs

* polish code
2022-05-31 12:00:12 +08:00
ver217 c4d903e64a
[gemini] accelerate adjust_layout() (#878)
* add lru cache

* polish code

* update unit test

* fix sharded optim
2022-04-26 18:08:31 +08:00
HELSON 425b4a96b8
[gemini] polish stateful_tensor_mgr (#876) 2022-04-26 15:05:03 +08:00
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities

* add unitest for GeminiMemoryManager
2022-04-24 13:08:48 +08:00
Jiarui Fang 3ddbd1bce1
[gemini] collect cpu-gpu moving volume in each iteration (#813) 2022-04-20 11:29:48 +08:00
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801) 2022-04-19 10:13:08 +08:00
Jiarui Fang 10ef8afdd2
[gemini] init genimi individual directory (#754) 2022-04-14 16:40:26 +08:00
ver217 dcca614eee
[hotfix] fix test_stateful_tensor_mgr (#762) 2022-04-14 15:50:09 +08:00
ver217 8f7ce94b8e
[hotfix] fix auto tensor placement policy (#753) 2022-04-14 12:04:45 +08:00
HELSON 84c6700b2a
[zero] refactor memstats_collector (#746) 2022-04-14 12:01:12 +08:00
Jiarui Fang 3d7dc46d33
[zero] use factory pattern for tensor_placement_policy (#752) 2022-04-14 11:07:29 +08:00
ver217 e396bb71f2
[zero] add tensor placement policies (#743)
* add tensor placement policies

* polish comments

* polish comments

* update moe unit tests
2022-04-13 15:00:48 +08:00
HELSON 22c4b88d56
[zero] refactor ShardedParamV2 for convenience (#742) 2022-04-13 14:54:26 +08:00
Jiarui Fang 4d90a7b513
[refactor] zero directory (#724) 2022-04-11 23:13:02 +08:00