Commit Graph

15 Commits (81145208d1ee778904c5e0ee7dfb25cde52ce913)

Author SHA1 Message Date
Jiarui Fang b334822163
[zero] polish sharded param name (#484)
* [zero] polish sharded param name

* polish code

* polish

* polish code

* polish

* polsih

* polish
2022-03-22 14:36:16 +08:00
ver217 3cb3fc275e
zero init ctx receives a dp process group (#471) 2022-03-21 11:18:55 +08:00
ver217 642846d6f9
update sharded optim and fix zero init ctx (#457) 2022-03-18 15:44:47 +08:00
Jiarui Fang e2e9f82588
Revert "[zero] update sharded optim and fix zero init ctx" (#456)
* Revert "polish code"

This reverts commit 8cf7ff08cf.

* Revert "rename variables"

This reverts commit e99af94ab8.

* Revert "remove surplus imports"

This reverts commit 46add4a5c5.

* Revert "update sharded optim and fix zero init ctx"

This reverts commit 57567ee768.
2022-03-18 15:22:43 +08:00
ver217 57567ee768 update sharded optim and fix zero init ctx 2022-03-18 14:25:25 +08:00
ver217 9506a8beb2 use double buffer to handle grad 2022-03-16 14:24:09 +08:00
Jiarui Fang 56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER (#417) 2022-03-15 11:29:46 +08:00
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395) 2022-03-14 22:05:30 +08:00
Jiarui Fang 272ebfb57d [bug] shard param during initializing the ShardedModelV2 (#381) 2022-03-11 15:50:28 +08:00
Jiarui Fang 6b6002962a [zero] zero init context collect numel of model (#375) 2022-03-11 15:50:28 +08:00
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
* place params on cpu after zero init context

* polish code
2022-03-11 15:50:28 +08:00
Jiarui Fang ea2872073f [zero] global model data memory tracer (#360) 2022-03-11 15:50:28 +08:00
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00
Jiarui Fang 11bddb6e55 [zero] update zero context init with the updated test utils (#327) 2022-03-11 15:50:28 +08:00
Jiarui Fang de0468c7a8 [zero] zero init context (#321)
* add zero init context

* add more flags for zero init context
fix bug of repeated converting param to ShardedParamV2

* polish code
2022-03-11 15:50:28 +08:00