ColossalAI/colossalai/tensor/distspec.py

from enum import Enum
from typing import List

__all__ = ['ReplicaSpec', 'ShardSpec']


class DistPlacementPattern(Enum):
    REPLICATE = 'r'
    SHARD = 's'


class _DistSpec:
    """_DistSpec
    
    A class indicates Distributed Specification.
    The DistSpec is only works for the tensor parallel process groups.
    Because the dist spec of data parallel process group can be automatically deduced.
    This is an internal data structrue.
    The API for users should be `ShardSpec` and `ReplicaSpec`.

    Args:
        dist_placement_pattern (DistPlacementPattern): the pattern describing how tensors are distributed among processes.
                                                The dist_placement_pattern is picked from a limited set, now including two patterns: replicate and shard.
        process_group (Optional[ProcessGroup], optional): the process group contains processes. Defaults to None.
    """

    def __init__(self, dist_placement_pattern: DistPlacementPattern, **meta_info):

        self.placement = dist_placement_pattern
        for k, v in meta_info.items():
            setattr(self, k, v)

    def __eq__(self, other: "_DistSpec") -> bool:
        if dir(self) != dir(other):
            return False
        for attr in dir(self):
            if not attr.startswith('__') and getattr(self, attr) != getattr(other, attr):
                return False
        return True

    def __repr__(self) -> str:
        res_list = ["DistSpec:"]
        for attr in dir(self):
            if not attr.startswith('__'):
                res_list.append(f'\n\t{attr}: {str(getattr(self, attr))}')
        return ''.join(res_list)


def ReplicaSpec() -> _DistSpec:
    """ReplicaSpec

    A distributed specification represents the tensor is replicated among the tensor parallel process group.

    Returns:
        _DistSpec: an replicated dist spec instance.
    """
    return _DistSpec(DistPlacementPattern.REPLICATE)


def ShardSpec(dims: List[int], num_partitions: List[int]) -> _DistSpec:
    """ShardSpec

    A distributed specification represents the tensor is sharded among the tensor parallel process group.

    Note:
        Currently, only shard on one dimension is valid. In another word, dims should be of size 1.

    Args:
        dims (List[int]): a list of dimensions
        num_partitions (List[int]): a list of partition number of each dimensions.

    Returns:
        _DistSpec: an shard dist spec instance.
    """
    assert isinstance(dims, list) and isinstance(num_partitions, list)
    assert len(dims) == len(num_partitions)
    return _DistSpec(DistPlacementPattern.SHARD, dims=tuple(dims), num_partitions=tuple(num_partitions))
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago			`from enum import Enum`
[NFC] polish doc style for ColoTensor (#1457) 2 years ago			`from typing import List`
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago
[Doc] add more doc for ColoTensor. (#1458) 2 years ago			`__all__ = ['ReplicaSpec', 'ShardSpec']`
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago

			`class DistPlacementPattern(Enum):`
			`REPLICATE = 'r'`
			`SHARD = 's'`


			`class _DistSpec:`
[Doc] add more doc for ColoTensor. (#1458) 2 years ago			`"""_DistSpec`

			`A class indicates Distributed Specification.`
			`The DistSpec is only works for the tensor parallel process groups.`
			`Because the dist spec of data parallel process group can be automatically deduced.`
			`This is an internal data structrue.`
			The API for users should be `ShardSpec` and `ReplicaSpec`.

			`Args:`
			`dist_placement_pattern (DistPlacementPattern): the pattern describing how tensors are distributed among processes.`
			`The dist_placement_pattern is picked from a limited set, now including two patterns: replicate and shard.`
			`process_group (Optional[ProcessGroup], optional): the process group contains processes. Defaults to None.`
			`"""`
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2 years ago			`def __init__(self, dist_placement_pattern: DistPlacementPattern, **meta_info):`
[graph] improve the graph building. (#1157) 2 years ago
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago			`self.placement = dist_placement_pattern`
			`for k, v in meta_info.items():`
			`setattr(self, k, v)`

			`def __eq__(self, other: "_DistSpec") -> bool:`
			`if dir(self) != dir(other):`
			`return False`
			`for attr in dir(self):`
			`if not attr.startswith('__') and getattr(self, attr) != getattr(other, attr):`
			`return False`
			`return True`

[Tensor] add module check and bert test (#1031) * add Embedding * Add bert test * polish * add check module test * polish * polish * polish * polish 3 years ago			`def __repr__(self) -> str:`
[polish] polish __repr__ for ColoTensor, DistSpec, ProcessGroup (#1235) 2 years ago			`res_list = ["DistSpec:"]`
[Tensor] add module check and bert test (#1031) * add Embedding * Add bert test * polish * add check module test * polish * polish * polish * polish 3 years ago			`for attr in dir(self):`
			`if not attr.startswith('__'):`
[polish] polish __repr__ for ColoTensor, DistSpec, ProcessGroup (#1235) 2 years ago			`res_list.append(f'\n\t{attr}: {str(getattr(self, attr))}')`
			`return ''.join(res_list)`
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago
[graph] improve the graph building. (#1157) 2 years ago
[Doc] add more doc for ColoTensor. (#1458) 2 years ago			`def ReplicaSpec() -> _DistSpec:`
			`"""ReplicaSpec`

			`A distributed specification represents the tensor is replicated among the tensor parallel process group.`

			`Returns:`
			`_DistSpec: an replicated dist spec instance.`
			`"""`
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2 years ago			`return _DistSpec(DistPlacementPattern.REPLICATE)`
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago

[Doc] add more doc for ColoTensor. (#1458) 2 years ago			`def ShardSpec(dims: List[int], num_partitions: List[int]) -> _DistSpec:`
			`"""ShardSpec`

			`A distributed specification represents the tensor is sharded among the tensor parallel process group.`

			`Note:`
			`Currently, only shard on one dimension is valid. In another word, dims should be of size 1.`

			`Args:`
			`dims (List[int]): a list of dimensions`
			`num_partitions (List[int]): a list of partition number of each dimensions.`

			`Returns:`
			`_DistSpec: an shard dist spec instance.`
			`"""`
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code 3 years ago			`assert isinstance(dims, list) and isinstance(num_partitions, list)`
			`assert len(dims) == len(num_partitions)`
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2 years ago			`return _DistSpec(DistPlacementPattern.SHARD, dims=tuple(dims), num_partitions=tuple(num_partitions))`