ColossalAI/examples/language/openmoe/benchmark
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982)
* fix: add warning for EP different behavior

* fix: use shard_data in ep & tp model

* to: add used_capacity

* fix: fix router test

* feat: add create_ep_node_group

* feat: add create_ep_hierarchical_group fn

* feat: add HierarchicalAllToAll

* test: add hierarchical all2all test

* fix: fix test errors

* fix: simplify create_ep_hierarchical_group

* fix: add hierarchical_alltoall arg

* fix: fix environ typo

* revert: revert process mesh order

* to: add todo mark

* fix: skip hierarchical_comm if torch < 1.13.1
2023-11-09 06:31:00 +00:00
..
benchmark_cai.py [moe]: fix ep/tp tests, add hierarchical all2all (#4982) 2023-11-09 06:31:00 +00:00
benchmark_cai.sh [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
benchmark_cai_dist.sh [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
benchmark_fsdp.py [moe] support optimizer checkpoint (#5015) 2023-11-08 15:07:03 +00:00
benchmark_fsdp.sh [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
hostfile.txt [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
utils.py [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00