ColossalAI/colossalai/moe
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982)
* fix: add warning for EP different behavior

* fix: use shard_data in ep & tp model

* to: add used_capacity

* fix: fix router test

* feat: add create_ep_node_group

* feat: add create_ep_hierarchical_group fn

* feat: add HierarchicalAllToAll

* test: add hierarchical all2all test

* fix: fix test errors

* fix: simplify create_ep_hierarchical_group

* fix: add hierarchical_alltoall arg

* fix: fix environ typo

* revert: revert process mesh order

* to: add todo mark

* fix: skip hierarchical_comm if torch < 1.13.1
2023-11-09 06:31:00 +00:00
..
__init__.py [moe] support optimizer checkpoint (#5015) 2023-11-08 15:07:03 +00:00
_operation.py [moe]: fix ep/tp tests, add hierarchical all2all (#4982) 2023-11-09 06:31:00 +00:00
checkpoint.py [moe] support optimizer checkpoint (#5015) 2023-11-08 15:07:03 +00:00
experts.py [moe] support optimizer checkpoint (#5015) 2023-11-08 15:07:03 +00:00
layers.py [moe]: fix ep/tp tests, add hierarchical all2all (#4982) 2023-11-09 06:31:00 +00:00
load_balance.py [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
loss.py [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
manager.py [moe] support optimizer checkpoint (#5015) 2023-11-08 15:07:03 +00:00
routers.py [moe]: fix ep/tp tests, add hierarchical all2all (#4982) 2023-11-09 06:31:00 +00:00
utils.py [moe]: fix ep/tp tests, add hierarchical all2all (#4982) 2023-11-09 06:31:00 +00:00