Commit Graph

12 Commits (e619a651fb27dc9db9ca0b97a10e00b3516243b0)

Author SHA1 Message Date
HELSON e6d50ec107
[zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero

* add unitest for moe-zero model init

* polish moe gradient handler
2022-03-31 18:34:11 +08:00
Jiarui Fang 7675366fce
[polish] rename col_attr -> colo_attr (#558) 2022-03-31 12:25:45 +08:00
Jiarui Fang 53b1b6e340
[zero] non model data tracing (#545) 2022-03-29 15:45:48 +08:00
Jiarui Fang c11ff81b15
[zero] get memory usage of sharded optim v2. (#542) 2022-03-29 09:08:18 +08:00
Jiarui Fang 705f56107c
[zero] refactor model data tracing (#537) 2022-03-28 16:38:18 +08:00
Jiarui Fang 8d8c5407c0
[zero] refactor model data tracing (#522) 2022-03-25 18:03:32 +08:00
Jiarui Fang 0bebda6ea5
[zero] fix init device bug in zero init context unittest (#516) 2022-03-25 12:24:18 +08:00
Jiarui Fang 7ef3507ace
[zero] show model data cuda memory usage after zero context init. (#515) 2022-03-25 11:23:35 +08:00
Jiarui Fang 0035b7be07
[memory] add model data tensor moving api (#503) 2022-03-24 14:29:41 +08:00
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500) 2022-03-23 18:03:39 +08:00
Jiarui Fang 56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER (#417) 2022-03-15 11:29:46 +08:00
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395) 2022-03-14 22:05:30 +08:00