[Docs] layout converting management (#2665)

pull/2666/head
YuliangLiu0306 2023-02-10 18:38:32 +08:00 committed by GitHub
parent 0385b26ebf
commit 8de85051b3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 25 additions and 0 deletions

View File

@ -0,0 +1,13 @@
When a tensor is required to have different sharding specs in upstream and downstream operators, we need to perform layout conversion processing, which can also be called redistribution. There are currently two mainstream methods, enumeration conversion, and dimension-by-dimension conversion. enumeration conversion is to enumerate all possible situations, and then find the corresponding conversion scheme in the table when conversion is required. However, it has a big problem. That is, as the dimension of the device mesh increases, the scale of this problem is so inflated that it cannot be solved by enumerating tables. Dimension-by-dimension conversion is for a sharding spec of an N-D tensor, X0X1...Xn-1, sharding spec is converted from 0 to n-1 dimension by dimension, so that no matter how many dimensions the device mesh and tensor have, with only one-time Scanning, a feasible conversion operation sequence is generated, the problem is that the conversion efficiency will be very poor.
Therefore, we propose a novel algorithm, using heuristic search, to solve the conversion problem of sharding spec, which can be described as:
1. Generate all one-step transform sharding specs from source spec
2. In the one-step transform sharding specs, according to the similarity function, select a sharding spec with the "least difference" as the subsequent source sharding spec, and record the sharding spec in the transform path. If a sharding spec of the one-step transforms is the same as the target sharding spec, the algorithm ends.
3. Repeat 1, 2 until the end of the algorithm
| Source/target sharding spec pairs |All gather | Shard | All to All | One step transform | Best sharding spec |Transform path|
| :-: | :-: | :-: | :-: | :-: | :-: |:-: |
| $S_{01}RR RS_{01}R$ | $S_0RR$ | - | $S_0RS_1, S_0S_1R$ | $S_0RR, S_0RS_1, S_0S_1R$ | $S_0RR$ | $S_0RR$
| $S_0RR, RS_{01}RR$ | $RRR$ | $S_0S_1R, S_0RS_1$ | $RS_0R, RRS_0$ | $RRR$, $S_0S_1R$, $S_0RS_1$, $RS_0R$, $RRS_0$ | $RS_0R$ | $S_0RR$ -> $RS_0R$
| $RS_0R, RS_{01}RR$ | $RRR$ | $RS_{01}R, S_1S_0R, RS_0S_1$ | $S_0RR, RRS_0$ | $RRR$, $RS_{01}R$, $S_1S_0R$, $RS_0S_1$, $S_0RR$, $RRS_0$ | $RS_{01}R$ | $S_0RR$ -> $RS_0R$ -> $RS_{01}R$

View File

@ -0,0 +1,12 @@
当一个张量在上下游算子中被要求的sharding spec不同时我们需要进行分布转换处理Layout Conversion。目前主流的方式有两种打表转换和逐维度转换。打表转换就是将所有可能的情况枚举出来然后在遇到需要转换的情况下去表格中找到对应的转换方案。
为了解决这个问题我们提出一个新奇的想法使用启发式的搜索来解决sharding spec的转换问题。
然而它有一个很大问题就是随着设备块Device Mesh的维度增加这个问题的规模极具膨胀以至于无法通过这种枚举打表的方式来解决。逐维度转换是对于一个N-d tensor的sharding specX0X1...Xn-1我们让i从0到n-1逐维度地进行转换这样不管设备块和张量的维度多少我们都只需要一次扫描就可以得到一个可行的转换操作序列然而它问题是这样的转换效率会很差。为了解决这个问题我们提出一个新奇的想法使用启发式算法来解决sharding spec的转换问题。这个算法可以描述为
1. 从source spec生成所有的one-step transform sharding specs
2. 在one-step transform sharding specs中根据相似度函数挑选一个”区别最小“的sharding spec作为后续的source sharding spec并将该sharding spec记录在transform path中如果one-step transform sharding spec中有与target sharding spec相同的sharding spec则算法结束。
3. 重复ab直到算法结束
| Source/target sharding spec pairs |All gather | Shard | All to All | One step transform | Best sharding spec |Transform path|
| :-: | :-: | :-: | :-: | :-: | :-: |:-: |
| $S_{01}RR RS_{01}R$ | $S_0RR$ | - | $S_0RS_1, S_0S_1R$ | $S_0RR, S_0RS_1, S_0S_1R$ | $S_0RR$ | $S_0RR$
| $S_0RR, RS_{01}RR$ | $RRR$ | $S_0S_1R, S_0RS_1$ | $RS_0R, RRS_0$ | $RRR$, $S_0S_1R$, $S_0RS_1$, $RS_0R$, $RRS_0$ | $RS_0R$ | $S_0RR$ -> $RS_0R$
| $RS_0R, RS_{01}RR$ | $RRR$ | $RS_{01}R, S_1S_0R, RS_0S_1$ | $S_0RR, RRS_0$ | $RRR$, $RS_{01}R$, $S_1S_0R$, $RS_0S_1$, $S_0RR$, $RRS_0$ | $RS_{01}R$ | $S_0RR$ -> $RS_0R$ -> $RS_{01}R$