HELSON
|
e5ea3fdeef
|
[gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities
* add unitest for GeminiMemoryManager
|
2022-04-24 13:08:48 +08:00 |
Jiarui Fang
|
e761ad2cd7
|
Revert "[zero] add ZeroTensorShardStrategy (#793)" (#806)
|
2022-04-19 14:40:02 +08:00 |
HELSON
|
88759e289e
|
[zero] add ZeroTensorShardStrategy (#793)
|
2022-04-19 14:32:45 +08:00 |
HELSON
|
a65cbb7e4e
|
[zero] refactor shard and gather operation (#773)
|
2022-04-15 14:41:31 +08:00 |
Jiarui Fang
|
4d90a7b513
|
[refactor] zero directory (#724)
|
2022-04-11 23:13:02 +08:00 |
Jiarui Fang
|
193dc8dacb
|
[refactor] refactor the memory utils (#715)
|
2022-04-11 16:47:57 +08:00 |
ver217
|
715b86eadd
|
[hotfix] fix stm cuda model data size (#710)
|
2022-04-11 15:10:39 +08:00 |
ver217
|
ab8c6b4a0e
|
[zero] refactor memstats collector (#706)
* refactor memstats collector
* fix disposable
* polish code
|
2022-04-11 10:46:08 +08:00 |
ver217
|
3c9cd5bb5e
|
[zero] stateful tensor manager (#687)
* [WIP] stateful tensor manager
* add eviction strategy
* polish code
* polish code
* polish comment
* add unit test
* fix sampler bug
* polish code
* fix max sampling cnt resetting bug
* fix sampler bug
* polish code
* fix bug
* fix unit test
Co-authored-by: jiaruifang <fangjiarui123@gmail.com>
|
2022-04-08 17:51:34 +08:00 |
Jiarui Fang
|
59bf2dc590
|
[zero] initialize a stateful tensor manager (#614)
|
2022-04-06 16:18:49 +08:00 |
Jiarui Fang
|
e956d93ac2
|
[refactor] memory utils (#577)
|
2022-04-01 09:22:33 +08:00 |
Jiarui Fang
|
53b1b6e340
|
[zero] non model data tracing (#545)
|
2022-03-29 15:45:48 +08:00 |
Jiarui Fang
|
705f56107c
|
[zero] refactor model data tracing (#537)
|
2022-03-28 16:38:18 +08:00 |
Jiarui Fang
|
8d8c5407c0
|
[zero] refactor model data tracing (#522)
|
2022-03-25 18:03:32 +08:00 |
Jiarui Fang
|
4d322b79da
|
[refactor] remove old zero code (#517)
|
2022-03-25 14:54:39 +08:00 |
Jiarui Fang
|
0bebda6ea5
|
[zero] fix init device bug in zero init context unittest (#516)
|
2022-03-25 12:24:18 +08:00 |
ver217
|
fc8e6db005
|
[doc] Update docstring for ZeRO (#459)
* polish sharded model docstr
* polish sharded optim docstr
* polish zero docstr
* polish shard strategy docstr
|
2022-03-18 16:48:20 +08:00 |
ver217
|
a241f61b34
|
[zero] Update initialize for ZeRO (#458)
* polish code
* shard strategy receive pg in shard() / gather()
* update zero engine
* polish code
|
2022-03-18 16:18:31 +08:00 |
Jiarui Fang
|
0fcfb1e00d
|
[test] make zero engine test really work (#447)
|
2022-03-17 17:24:25 +08:00 |
ver217
|
63469c0f91
|
polish code
|
2022-03-14 15:48:55 +08:00 |
ver217
|
88804aee49
|
add bucket tensor shard strategy
|
2022-03-14 14:48:32 +08:00 |
HELSON
|
7c079d9c33
|
[hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394)
|
2022-03-11 18:12:46 +08:00 |
Jiarui Fang
|
44e4891f57
|
[zero] able to place params on cpu after zero init context (#365)
* place params on cpu after zero init context
* polish code
|
2022-03-11 15:50:28 +08:00 |
ver217
|
1388671699
|
[zero] Update sharded model v2 using sharded param v2 (#323)
|
2022-03-11 15:50:28 +08:00 |
Jiarui Fang
|
11bddb6e55
|
[zero] update zero context init with the updated test utils (#327)
|
2022-03-11 15:50:28 +08:00 |
Jiarui Fang
|
c9e7d9582d
|
[zero] polish shard strategy (#310)
* init shard param from shape tuple
* add more unitest for shard param
* add set_payload method for ShardedParam
* [zero] add shareded tensor class
* polish code
* add shard stratgy
* move shard and gather logic to shard strategy from shard tensor.
* polish code
|
2022-03-11 15:50:28 +08:00 |
Jiarui Fang
|
74f77e314b
|
[zero] a shard strategy in granularity of tensor (#307)
|
2022-03-11 15:50:28 +08:00 |