Commit Graph

12 Commits (e31d2ebcf7fd427154f8fcb7e3d1437d0b65721f)

Author SHA1 Message Date
hxwang c67e553fd3
[moe] remove ops 2024-07-22 04:00:42 +00:00
hxwang 783aafa327
[moe] full test for deepseek and mixtral (pp + sp to fix) 2024-07-19 07:32:56 +00:00
hxwang c8bf2681e3
[moe] clean legacy code 2024-07-19 07:32:01 +00:00
botbw 335ad3c6fb
[moe] implement tp 2024-07-19 07:30:17 +00:00
botbw 1b15cc97f5
[moe] add mixtral dp grad scaling when not all experts are activated 2024-07-19 07:30:14 +00:00
botbw 2431694564
[moe] implement transit between non moe tp and ep 2024-07-19 07:29:35 +00:00
hxwang 61109c7843
[zero] solve hang 2024-07-19 07:29:07 +00:00
Hongxin Liu da39d21b71 [moe] support mixtral (#5309)
* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer
2024-02-07 19:21:02 +08:00
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
* [feat] refactored extension module

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish
2024-01-25 17:01:48 +08:00
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032)
* feat: modify create_ep_hierarchical_group args

* test: add ep tests

* fix: remove get_process_group_ranks

* fix: fix src_rank
2023-11-17 10:53:00 +08:00
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982)
* fix: add warning for EP different behavior

* fix: use shard_data in ep & tp model

* to: add used_capacity

* fix: fix router test

* feat: add create_ep_node_group

* feat: add create_ep_hierarchical_group fn

* feat: add HierarchicalAllToAll

* test: add hierarchical all2all test

* fix: fix test errors

* fix: simplify create_ep_hierarchical_group

* fix: add hierarchical_alltoall arg

* fix: fix environ typo

* revert: revert process mesh order

* to: add todo mark

* fix: skip hierarchical_comm if torch < 1.13.1
2023-11-09 06:31:00 +00:00
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
* update moe module
* support openmoe
2023-11-02 02:21:24 +00:00