Commit Graph

23 Commits (6f7d1362c901748ca9f005dc96388605aa195af9)

Author SHA1 Message Date
Jiarui Fang 193dc8dacb
[refactor] refactor the memory utils (#715)
3 years ago
HELSON d7ecaf362b
[zero] fix init bugs in zero context (#686)
3 years ago
Jiarui Fang e956d93ac2
[refactor] memory utils (#577)
3 years ago
Jiarui Fang 7675366fce
[polish] rename col_attr -> colo_attr (#558)
3 years ago
Jiarui Fang 53b1b6e340
[zero] non model data tracing (#545)
3 years ago
ver217 1f90a3b129
[zero] polish ZeroInitContext (#540)
3 years ago
Jiarui Fang 705f56107c
[zero] refactor model data tracing (#537)
3 years ago
Jiarui Fang 8d8c5407c0
[zero] refactor model data tracing (#522)
3 years ago
Frank Lee 3601b2bad0
[test] fixed rerun_on_exception and adapted test cases (#487)
3 years ago
Jiarui Fang 0bebda6ea5
[zero] fix init device bug in zero init context unittest (#516)
3 years ago
Jiarui Fang b334822163
[zero] polish sharded param name (#484)
3 years ago
ver217 a241f61b34
[zero] Update initialize for ZeRO (#458)
3 years ago
Frank Lee f27d801a13
[test] optimized zero data parallel test (#452)
3 years ago
Jiarui Fang 56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER (#417)
3 years ago
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395)
3 years ago
ver217 54fd37f0e0 polish unit test
3 years ago
Jiarui Fang 6b6002962a [zero] zero init context collect numel of model (#375)
3 years ago
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
3 years ago
Jiarui Fang ea2872073f [zero] global model data memory tracer (#360)
3 years ago
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323)
3 years ago
jiaruifang dec24561cf show pytest parameterize
3 years ago
Jiarui Fang 11bddb6e55 [zero] update zero context init with the updated test utils (#327)
3 years ago
Jiarui Fang de0468c7a8 [zero] zero init context (#321)
3 years ago