Hongxin Liu
|
65e5d6baa5
|
[moe] fix mixtral optim checkpoint (#5344)
|
2024-02-07 19:21:02 +08:00 |
Hongxin Liu
|
956b561b54
|
[moe] fix mixtral forward default value (#5329)
|
2024-02-07 19:21:02 +08:00 |
Hongxin Liu
|
b60be18dcc
|
[moe] fix mixtral checkpoint io (#5314)
|
2024-02-07 19:21:02 +08:00 |
Hongxin Liu
|
da39d21b71
|
[moe] support mixtral (#5309)
* [moe] add mixtral block for single expert
* [moe] mixtral block fwd support uneven ep
* [moe] mixtral block bwd support uneven ep
* [moe] add mixtral moe layer
* [moe] simplify replace
* [meo] support save sharded mixtral
* [meo] support load sharded mixtral
* [meo] support save sharded optim
* [meo] integrate moe manager into plug
* [meo] fix optimizer load
* [meo] fix mixtral layer
|
2024-02-07 19:21:02 +08:00 |
Hongxin Liu
|
c904d2ae99
|
[moe] update capacity computing (#5253)
* [moe] top2 allow uneven input
* [moe] update capacity computing
* [moe] remove debug info
* [moe] update capacity computing
* [moe] update capacity computing
|
2024-02-07 19:21:02 +08:00 |
Xuanlei Zhao
|
7d8e0338a4
|
[moe] init mixtral impl
|
2024-02-07 19:21:02 +08:00 |