ver217
|
7a05367101
|
[hotfix] shared model returns cpu state_dict (#1328)
|
2 years ago |
Jiarui Fang
|
4165eabb1e
|
[hotfix] remove potiential circle import (#1307)
* make it faster
* [hotfix] remove circle import
|
2 years ago |
Jiarui Fang
|
a444633d13
|
warmup ratio configration (#1192)
|
2 years ago |
ver217
|
6690a61b4d
|
[hotfix] prevent nested ZeRO (#1140)
|
2 years ago |
ver217
|
e3fde4ee6b
|
fix import error in sharded model v2 (#1053)
|
3 years ago |
ver217
|
7cfd6c827e
|
[zero] add load_state_dict for sharded model (#894)
* add load_state_dict for sharded model
* fix bug
* fix bug
* fix ckpt dtype and device
* support load state dict in zero init ctx
* fix bugs
|
3 years ago |
HELSON
|
425b4a96b8
|
[gemini] polish stateful_tensor_mgr (#876)
|
3 years ago |
ver217
|
d7e0303d1e
|
[zero] use GeminiMemoryManager when sampling model data (#850)
|
3 years ago |
HELSON
|
e5ea3fdeef
|
[gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities
* add unitest for GeminiMemoryManager
|
3 years ago |
Jiarui Fang
|
4d9332b4c5
|
[refactor] moving memtracer to gemini (#801)
|
3 years ago |
HELSON
|
4c4388c46e
|
[hotfix] fix memory leak in zero (#781)
|
3 years ago |
Jiarui Fang
|
10ef8afdd2
|
[gemini] init genimi individual directory (#754)
|
3 years ago |
ver217
|
a93a7d7364
|
[hotfix] fix reuse_fp16_shard of sharded model (#756)
* fix reuse_fp16_shard
* disable test stm
* polish code
|
3 years ago |
ver217
|
8f7ce94b8e
|
[hotfix] fix auto tensor placement policy (#753)
|
3 years ago |
Jiarui Fang
|
3d7dc46d33
|
[zero] use factory pattern for tensor_placement_policy (#752)
|
3 years ago |
ver217
|
e396bb71f2
|
[zero] add tensor placement policies (#743)
* add tensor placement policies
* polish comments
* polish comments
* update moe unit tests
|
3 years ago |
HELSON
|
22c4b88d56
|
[zero] refactor ShardedParamV2 for convenience (#742)
|
3 years ago |
ver217
|
e6212f56cd
|
[hotfix] fix memory leak in backward of sharded model (#741)
|
3 years ago |
Jiarui Fang
|
7db3ccc79b
|
[hotfix] remove duplicated param register to stateful tensor manager (#728)
|
3 years ago |
Jiarui Fang
|
4d90a7b513
|
[refactor] zero directory (#724)
|
3 years ago |
Jiarui Fang
|
193dc8dacb
|
[refactor] refactor the memory utils (#715)
|
3 years ago |
HELSON
|
dbd96fe90a
|
[zero] check whether gradients have inf and nan in gpu (#712)
|
3 years ago |
HELSON
|
a9b8300d54
|
[zero] improve adaptability for not-shard parameters (#708)
* adapt post grad hooks for not-shard parameters
* adapt optimizer for not-shard parameters
* offload gradients for not-replicated parameters
|
3 years ago |
ver217
|
ab8c6b4a0e
|
[zero] refactor memstats collector (#706)
* refactor memstats collector
* fix disposable
* polish code
|
3 years ago |
HELSON
|
ee112fe1da
|
[zero] adapt zero hooks for unsharded module (#699)
|
3 years ago |
ver217
|
3c9cd5bb5e
|
[zero] stateful tensor manager (#687)
* [WIP] stateful tensor manager
* add eviction strategy
* polish code
* polish code
* polish comment
* add unit test
* fix sampler bug
* polish code
* fix max sampling cnt resetting bug
* fix sampler bug
* polish code
* fix bug
* fix unit test
Co-authored-by: jiaruifang <fangjiarui123@gmail.com>
|
3 years ago |
ver217
|
0ef8819c67
|
polish docstring of zero (#612)
|
3 years ago |
Jiarui Fang
|
e956d93ac2
|
[refactor] memory utils (#577)
|
3 years ago |
HELSON
|
e6d50ec107
|
[zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero
* add unitest for moe-zero model init
* polish moe gradient handler
|
3 years ago |
ver217
|
7c6c427db1
|
[zero] trace states of fp16/32 grad and fp32 param (#571)
|
3 years ago |
Jiarui Fang
|
7675366fce
|
[polish] rename col_attr -> colo_attr (#558)
|
3 years ago |
ver217
|
014bac0c49
|
[zero] hijack p.grad in sharded model (#554)
* hijack p.grad in sharded model
* polish comments
* polish comments
|
3 years ago |
Jiarui Fang
|
f552b11294
|
[zero] label state for param fp16 and grad (#551)
|
3 years ago |
Jiarui Fang
|
214da761d4
|
[zero] add stateful tensor (#549)
|
3 years ago |
Jiarui Fang
|
107b99ddb1
|
[zero] dump memory stats for sharded model (#548)
|
3 years ago |
Jiarui Fang
|
53b1b6e340
|
[zero] non model data tracing (#545)
|
3 years ago |
ver217
|
fb841dd5c5
|
[zero] optimize grad offload (#539)
* optimize grad offload
* polish code
* polish code
|
3 years ago |
Jiarui Fang
|
705f56107c
|
[zero] refactor model data tracing (#537)
|
3 years ago |
Jiarui Fang
|
05e33b2578
|
[zero] fix grad offload (#528)
* [zero] fix grad offload
* polish code
|
3 years ago |
Jiarui Fang
|
4d322b79da
|
[refactor] remove old zero code (#517)
|
3 years ago |
Jiarui Fang
|
7ef3507ace
|
[zero] show model data cuda memory usage after zero context init. (#515)
|
3 years ago |
Jiarui Fang
|
0035b7be07
|
[memory] add model data tensor moving api (#503)
|
3 years ago |
ver217
|
9ec1ce6ab1
|
[zero] sharded model support the reuse of fp16 shard (#495)
* sharded model supports reuse fp16 shard
* rename variable
* polish code
* polish code
* polish code
|
3 years ago |
ver217
|
c4c02424f3
|
[zero] sharded model manages ophooks individually (#492)
|
3 years ago |
ver217
|
62b0a8d644
|
[zero] sharded optim support hybrid cpu adam (#486)
* sharded optim support hybrid cpu adam
* update unit test
* polish docstring
|
3 years ago |
Jiarui Fang
|
b334822163
|
[zero] polish sharded param name (#484)
* [zero] polish sharded param name
* polish code
* polish
* polish code
* polish
* polsih
* polish
|
3 years ago |
ver217
|
8d3250d74b
|
[zero] ZeRO supports pipeline parallel (#477)
|
3 years ago |
ver217
|
fc8e6db005
|
[doc] Update docstring for ZeRO (#459)
* polish sharded model docstr
* polish sharded optim docstr
* polish zero docstr
* polish shard strategy docstr
|
3 years ago |
ver217
|
a241f61b34
|
[zero] Update initialize for ZeRO (#458)
* polish code
* shard strategy receive pg in shard() / gather()
* update zero engine
* polish code
|
3 years ago |
ver217
|
642846d6f9
|
update sharded optim and fix zero init ctx (#457)
|
3 years ago |