Commit Graph

12 Commits (92f4224867581e741d5a673b6b807b644127fbf7)

Author SHA1 Message Date
Jiarui Fang 214da761d4
[zero] add stateful tensor (#549) 2022-03-30 13:51:37 +08:00
HELSON a30e2b4c24
[zero] adapt for no-leaf module in zero (#535)
only process module's own parameters in Zero context

add zero hooks for all modules that contrain parameters

gather parameters only belonging to module itself
2022-03-28 17:42:18 +08:00
Jiarui Fang 705f56107c
[zero] refactor model data tracing (#537) 2022-03-28 16:38:18 +08:00
Jiarui Fang 920c5889a7
[zero] add colo move inline (#521) 2022-03-25 14:02:55 +08:00
Jiarui Fang b334822163
[zero] polish sharded param name (#484)
* [zero] polish sharded param name

* polish code

* polish

* polish code

* polish

* polsih

* polish
2022-03-22 14:36:16 +08:00
ver217 a241f61b34
[zero] Update initialize for ZeRO (#458)
* polish code

* shard strategy receive pg in shard() / gather()

* update zero engine

* polish code
2022-03-18 16:18:31 +08:00
ver217 9506a8beb2 use double buffer to handle grad 2022-03-16 14:24:09 +08:00
Jiarui Fang 56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER (#417) 2022-03-15 11:29:46 +08:00
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395) 2022-03-14 22:05:30 +08:00
ver217 88804aee49 add bucket tensor shard strategy 2022-03-14 14:48:32 +08:00
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
* place params on cpu after zero init context

* polish code
2022-03-11 15:50:28 +08:00
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00