HELSON
|
b528eea0f0
|
[zero] add zero wrappers (#2523)
* [zero] add zero wrappers
* change names
* add wrapper functions to init
|
2 years ago |
HELSON
|
077a5cdde4
|
[zero] fix gradient clipping in hybrid parallelism (#2521)
* [zero] fix gradient clipping in hybrid parallelism
* [testing] change model name to avoid pytest warning
* [hotfix] fix unit testing
|
2 years ago |
HELSON
|
d565a24849
|
[zero] add unit testings for hybrid parallelism (#2486)
|
2 years ago |
HELSON
|
a5dc4253c6
|
[zero] polish low level optimizer (#2473)
|
2 years ago |
Jiarui Fang
|
867c8c2d3a
|
[zero] low level optim supports ProcessGroup (#2464)
|
2 years ago |
HELSON
|
62c38e3330
|
[zero] polish low level zero optimizer (#2275)
|
2 years ago |
HELSON
|
a7d95b7024
|
[example] add zero1, zero2 example in GPT examples (#2146)
* [example] add zero1 and zero2 for GPT
* update readme in gpt example
* polish code
* change init value
* update readme
|
2 years ago |
HELSON
|
a1ce02d740
|
[zero] test gradient accumulation (#1964)
* [zero] fix memory leak for zero2
* [zero] test gradient accumulation
* [zero] remove grad clip test
|
2 years ago |
HELSON
|
7066dfbf82
|
[zero] fix memory leak for zero2 (#1955)
|
2 years ago |
HELSON
|
6e51d296f0
|
[zero] migrate zero1&2 (#1878)
* add zero1&2 optimizer
* rename test ditectory
* rename test files
* change tolerance in test
|
2 years ago |
ver217
|
c9e8ce67b8
|
fix move fp32 shards (#1604)
|
2 years ago |
ver217
|
ce470ba37e
|
[checkpoint] sharded optim save/load grad scaler (#1350)
|
2 years ago |
ver217
|
a45ddf2d5f
|
[hotfix] fix sharded optim step and clip_grad_norm (#1226)
|
2 years ago |
ver217
|
9e1daa63d2
|
[zero] sharded optim supports loading local state dict (#1170)
* sharded optim supports loading local state dict
* polish code
* add unit test
|
2 years ago |
ver217
|
6690a61b4d
|
[hotfix] prevent nested ZeRO (#1140)
|
2 years ago |
ver217
|
c4d903e64a
|
[gemini] accelerate adjust_layout() (#878)
* add lru cache
* polish code
* update unit test
* fix sharded optim
|
3 years ago |
ver217
|
d7e0303d1e
|
[zero] use GeminiMemoryManager when sampling model data (#850)
|
3 years ago |
HELSON
|
e5ea3fdeef
|
[gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities
* add unitest for GeminiMemoryManager
|
3 years ago |
Jiarui Fang
|
61c20b44bc
|
[log] local throughput metrics (#811)
* Revert "[zero] add ZeroTensorShardStrategy (#793)"
This reverts commit 88759e289e .
* [gemini] set cpu memory capacity
* [log] local throughput collecting
* polish
* polish
* polish
* polish code
* polish
|
3 years ago |
Jiarui Fang
|
4d9332b4c5
|
[refactor] moving memtracer to gemini (#801)
|
3 years ago |
Jiarui Fang
|
8711c706f4
|
[hotfix] fix grad offload when enabling reuse_fp16_shard
|
3 years ago |
ver217
|
f1fa1a675f
|
fix grad offload when enabling reuse_fp16_shard
|
3 years ago |
HELSON
|
4c4388c46e
|
[hotfix] fix memory leak in zero (#781)
|
3 years ago |
ver217
|
6e553748a7
|
polish sharded optim docstr and warning (#770)
|
3 years ago |
Jiarui Fang
|
10ef8afdd2
|
[gemini] init genimi individual directory (#754)
|
3 years ago |
ver217
|
4b048a8728
|
fix prepare grads in sharded optim (#749)
|
3 years ago |
ver217
|
e396bb71f2
|
[zero] add tensor placement policies (#743)
* add tensor placement policies
* polish comments
* polish comments
* update moe unit tests
|
3 years ago |
HELSON
|
22c4b88d56
|
[zero] refactor ShardedParamV2 for convenience (#742)
|
3 years ago |
Jiarui Fang
|
4d90a7b513
|
[refactor] zero directory (#724)
|
3 years ago |
HELSON
|
dbd96fe90a
|
[zero] check whether gradients have inf and nan in gpu (#712)
|
3 years ago |
HELSON
|
a9b8300d54
|
[zero] improve adaptability for not-shard parameters (#708)
* adapt post grad hooks for not-shard parameters
* adapt optimizer for not-shard parameters
* offload gradients for not-replicated parameters
|
3 years ago |
HELSON
|
ee112fe1da
|
[zero] adapt zero hooks for unsharded module (#699)
|
3 years ago |
ver217
|
3c9cd5bb5e
|
[zero] stateful tensor manager (#687)
* [WIP] stateful tensor manager
* add eviction strategy
* polish code
* polish code
* polish comment
* add unit test
* fix sampler bug
* polish code
* fix max sampling cnt resetting bug
* fix sampler bug
* polish code
* fix bug
* fix unit test
Co-authored-by: jiaruifang <fangjiarui123@gmail.com>
|
3 years ago |
HELSON
|
17e73e62cc
|
[hotfix] fix bugs for unsharded parameters when restore data (#664)
|
3 years ago |
Jiarui Fang
|
0aab52301e
|
[hotfix] fix a bug in model data stats tracing (#655)
|
3 years ago |
HELSON
|
055fbf5be6
|
[zero] adapt zero for unsharded paramters (Optimizer part) (#601)
|
3 years ago |
ver217
|
0ef8819c67
|
polish docstring of zero (#612)
|
3 years ago |
ver217
|
9bee119104
|
[hotfix] fix sharded optim zero grad (#604)
* fix sharded optim zero grad
* polish comments
|
3 years ago |
Jiarui Fang
|
e956d93ac2
|
[refactor] memory utils (#577)
|
3 years ago |
ver217
|
7c6c427db1
|
[zero] trace states of fp16/32 grad and fp32 param (#571)
|
3 years ago |
Jiarui Fang
|
7675366fce
|
[polish] rename col_attr -> colo_attr (#558)
|
3 years ago |
ver217
|
014bac0c49
|
[zero] hijack p.grad in sharded model (#554)
* hijack p.grad in sharded model
* polish comments
* polish comments
|
3 years ago |
Jiarui Fang
|
f552b11294
|
[zero] label state for param fp16 and grad (#551)
|
3 years ago |
Jiarui Fang
|
107b99ddb1
|
[zero] dump memory stats for sharded model (#548)
|
3 years ago |
Jiarui Fang
|
53b1b6e340
|
[zero] non model data tracing (#545)
|
3 years ago |
ver217
|
fb841dd5c5
|
[zero] optimize grad offload (#539)
* optimize grad offload
* polish code
* polish code
|
3 years ago |
Jiarui Fang
|
c11ff81b15
|
[zero] get memory usage of sharded optim v2. (#542)
|
3 years ago |
Jiarui Fang
|
705f56107c
|
[zero] refactor model data tracing (#537)
|
3 years ago |
Jiarui Fang
|
05e33b2578
|
[zero] fix grad offload (#528)
* [zero] fix grad offload
* polish code
|
3 years ago |
Jiarui Fang
|
4d322b79da
|
[refactor] remove old zero code (#517)
|
3 years ago |