24 Commits (5898ccf38b1065ab871c2bdacdfe764b2e896073)

Author SHA1 Message Date
LuGY 2883040286
[example] change qkv processing (#870) 3 years ago
LuGY 13ed4b6441
[model zoo] add activation offload for gpt model (#582) 3 years ago
HELSON 0f2d219162
[MOE] add MOEGPT model (#510) 3 years ago
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500) 3 years ago
HELSON c9023d4078
[MOE] support PR-MOE (#488) 3 years ago
ver217 d70f43dd7a
embedding remove attn mask (#474) 3 years ago
HELSON 7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469) 3 years ago
ver217 1559c0df41
fix attn mask shape of gpt (#472) 3 years ago
ver217 304263c2ce
fix gpt attention mask (#461) 3 years ago
HELSON dbdc9a7783
added Multiply Jitter and capacity factor eval for MOE (#434) 3 years ago
Frank Lee 0f5f5dd556
fixed gpt attention mask in pipeline (#430) 3 years ago
lucasliunju ce886a9062 fix format (#374) 3 years ago
Ziheng Qin 0db43fa995 fix format (#364) 3 years ago
1SAA 82023779bb Added TPExpert for special situation 3 years ago
1SAA 219df6e685 Optimized MoE layer and fixed some bugs; 3 years ago
アマデウス 9ee197d0e9 moved env variables to global variables; (#215) 3 years ago
HELSON 1ff5be36c2
Added moe parallel example (#140) 3 years ago
HELSON dceae85195
Added MoE parallel (#127) 3 years ago
ver217 7904baf6e1
fix layers/schedule for hybrid parallelization (#111) (#112) 3 years ago
アマデウス e5b9f9a08d
added gpt model & benchmark (#95) 3 years ago
アマデウス 01a80cd86d
Hotfix/Colossalai layers (#92) 3 years ago
アマデウス 0fedef4f3c
Layer integration (#83) 3 years ago
Frank Lee da01c234e1
Develop/experiments (#59) 3 years ago
zbian 404ecbdcc6 Migrated project 3 years ago