Commit Graph

15 Commits (cc236916c66b4af3ddec1f10e083b5bba7ea6c86)

Author SHA1 Message Date
Jiarui Fang 0aab52301e
[hotfix] fix a bug in model data stats tracing (#655) 2022-04-03 21:48:06 +08:00
HELSON e5d615aeee
[hotfix] fix bugs in testing (#659)
* remove hybrid adam in test_moe_zero_optim

* fix activation checkpointing and its unitest
2022-04-02 21:58:47 +08:00
HELSON b31daed4cf
fix bugs in CPU adam (#633)
* add cpu adam counter for all cpu adam

* fixed updating error in adam kernel
2022-04-02 17:04:05 +08:00
HELSON 055fbf5be6
[zero] adapt zero for unsharded paramters (Optimizer part) (#601) 2022-04-01 20:10:47 +08:00
HELSON e6d50ec107
[zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero

* add unitest for moe-zero model init

* polish moe gradient handler
2022-03-31 18:34:11 +08:00
Jiarui Fang 7675366fce
[polish] rename col_attr -> colo_attr (#558) 2022-03-31 12:25:45 +08:00
HELSON 8c90d4df54
[zero] add zero context manager to change config during initialization (#546) 2022-03-29 17:57:59 +08:00
Liang Bowen ec5086c49c Refactored docstring to google style 2022-03-29 17:17:47 +08:00
Frank Lee 3601b2bad0
[test] fixed rerun_on_exception and adapted test cases (#487) 2022-03-25 17:25:12 +08:00
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500) 2022-03-23 18:03:39 +08:00
Jiarui Fang 65c0f380c2
[format] polish name format for MOE (#481) 2022-03-21 23:19:47 +08:00
HELSON 7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469) 2022-03-21 13:35:04 +08:00
HELSON 84fd7c1d4d
add moe context, moe utilities and refactor gradient handler (#455) 2022-03-18 16:38:32 +08:00
1SAA 82023779bb Added TPExpert for special situation 2022-03-11 15:50:28 +08:00
1SAA 219df6e685 Optimized MoE layer and fixed some bugs;
Decreased moe tests;

Added FFNExperts and ViTMoE model
2022-03-11 15:50:28 +08:00