Commit Graph

16 Commits (05e33b257886d78322b2d94badb632a172f9cd9b)

Author SHA1 Message Date
Jiarui Fang 8d8c5407c0
[zero] refactor model data tracing (#522) 2022-03-25 18:03:32 +08:00
Frank Lee 3601b2bad0
[test] fixed rerun_on_exception and adapted test cases (#487) 2022-03-25 17:25:12 +08:00
Jiarui Fang 0bebda6ea5
[zero] fix init device bug in zero init context unittest (#516) 2022-03-25 12:24:18 +08:00
Jiarui Fang b334822163
[zero] polish sharded param name (#484)
* [zero] polish sharded param name

* polish code

* polish

* polish code

* polish

* polsih

* polish
2022-03-22 14:36:16 +08:00
ver217 a241f61b34
[zero] Update initialize for ZeRO (#458)
* polish code

* shard strategy receive pg in shard() / gather()

* update zero engine

* polish code
2022-03-18 16:18:31 +08:00
Frank Lee f27d801a13
[test] optimized zero data parallel test (#452) 2022-03-18 11:35:54 +08:00
Jiarui Fang 56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER (#417) 2022-03-15 11:29:46 +08:00
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395) 2022-03-14 22:05:30 +08:00
ver217 54fd37f0e0 polish unit test 2022-03-14 15:06:02 +08:00
Jiarui Fang 6b6002962a [zero] zero init context collect numel of model (#375) 2022-03-11 15:50:28 +08:00
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
* place params on cpu after zero init context

* polish code
2022-03-11 15:50:28 +08:00
Jiarui Fang ea2872073f [zero] global model data memory tracer (#360) 2022-03-11 15:50:28 +08:00
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00
jiaruifang dec24561cf show pytest parameterize 2022-03-11 15:50:28 +08:00
Jiarui Fang 11bddb6e55 [zero] update zero context init with the updated test utils (#327) 2022-03-11 15:50:28 +08:00
Jiarui Fang de0468c7a8 [zero] zero init context (#321)
* add zero init context

* add more flags for zero init context
fix bug of repeated converting param to ShardedParamV2

* polish code
2022-03-11 15:50:28 +08:00