23 Commits (c7d68b2c2ca3f7fd32056ea952fae4fe239f75ea)

Author SHA1 Message Date
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743) 1 year ago
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560) 1 year ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452) 2 years ago
ver217 933048ad3e
[test] reorganize zero/gemini tests (#3445) 2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424) 2 years ago
Jiarui Fang 1e885329f4
[test] align model name with the file name. (#2045) 2 years ago
HELSON a088022efc
[moe] fix moe bugs (#1633) 2 years ago
HELSON f7f2248771
[moe] fix MoE bugs (#1628) 2 years ago
Frank Lee 5a1a095b92
[test] refactored with the new rerun decorator (#763) 3 years ago
ver217 e396bb71f2
[zero] add tensor placement policies (#743) 3 years ago
HELSON 22c4b88d56
[zero] refactor ShardedParamV2 for convenience (#742) 3 years ago
Jiarui Fang 53cb584808
[utils] correct cpu memory used and capacity in the context of multi-process (#726) 3 years ago
HELSON b9b469ea50
[moe] add checkpoint for moe zero test (#729) 3 years ago
Jiarui Fang 193dc8dacb
[refactor] refactor the memory utils (#715) 3 years ago
HELSON a9b8300d54
[zero] improve adaptability for not-shard parameters (#708) 3 years ago
HELSON ee112fe1da
[zero] adapt zero hooks for unsharded module (#699) 3 years ago
HELSON d7ecaf362b
[zero] fix init bugs in zero context (#686) 3 years ago
HELSON e5d615aeee
[hotfix] fix bugs in testing (#659) 3 years ago
HELSON b31daed4cf
fix bugs in CPU adam (#633) 3 years ago
HELSON 055fbf5be6
[zero] adapt zero for unsharded paramters (Optimizer part) (#601) 3 years ago