Commit Graph

18 Commits (9caa8b6481b3ed8d5555502c7671dcd2493c0da6)

Author SHA1 Message Date
Frank Lee af185b5519
[test] fixed amp convergence comparison test (#454) 2022-03-18 16:28:16 +08:00
ver217 642846d6f9
update sharded optim and fix zero init ctx (#457) 2022-03-18 15:44:47 +08:00
Jiarui Fang e2e9f82588
Revert "[zero] update sharded optim and fix zero init ctx" (#456)
* Revert "polish code"

This reverts commit 8cf7ff08cf.

* Revert "rename variables"

This reverts commit e99af94ab8.

* Revert "remove surplus imports"

This reverts commit 46add4a5c5.

* Revert "update sharded optim and fix zero init ctx"

This reverts commit 57567ee768.
2022-03-18 15:22:43 +08:00
ver217 8cf7ff08cf polish code 2022-03-18 14:25:25 +08:00
ver217 57567ee768 update sharded optim and fix zero init ctx 2022-03-18 14:25:25 +08:00
Frank Lee f27d801a13
[test] optimized zero data parallel test (#452) 2022-03-18 11:35:54 +08:00
Jiarui Fang 0fcfb1e00d
[test] make zero engine test really work (#447) 2022-03-17 17:24:25 +08:00
Jiarui Fang f9c762df85
[test] merge zero optim tests (#428) 2022-03-16 12:22:45 +08:00
Jiarui Fang adebb3e041
[zero] cuda margin space for OS (#418) 2022-03-15 12:02:19 +08:00
Jiarui Fang 23ba3fc450
[zero] refactory ShardedOptimV2 init method (#416) 2022-03-15 10:45:55 +08:00
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395) 2022-03-14 22:05:30 +08:00
Jiarui Fang 370f567e7d
[zero] new interface for ShardedOptimv2 (#406) 2022-03-14 20:48:41 +08:00
ver217 54fd37f0e0 polish unit test 2022-03-14 15:06:02 +08:00
Jiarui Fang 3af13a2c3e [zero] polish ShardedOptimV2 unittest (#385)
* place params on cpu after zero init context

* polish code

* bucketzed cpu gpu tensor transter

* find a bug in sharded optim unittest

* add offload unittest for ShardedOptimV2.

* polish code and make it more robust
2022-03-11 15:50:28 +08:00
ver217 d0ae0f2215 [zero] update sharded optim v2 (#334) 2022-03-11 15:50:28 +08:00
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00
ver217 36f9a74ab2 fix sharded param hook and unit test 2022-03-11 15:50:28 +08:00
ver217 001ca624dd impl shard optim v2 and add unit test 2022-03-11 15:50:28 +08:00