ColossalAI/colossalai/shardformer/shard/shard_config.py

from dataclasses import dataclass

import torch.distributed as dist
from torch.distributed import ProcessGroup

from colossalai.cluster.dist_coordinator import DistCoordinator

__all__ = ['ShardConfig']


@dataclass
class ShardConfig:
    r"""
    The config for sharding the huggingface model

    Args:
        tensor_parallel_process_group (int): The process group for tensor parallelism, defaults to None, which is the global process group.
        enable_fused_normalization (bool): Whether to use fused layernorm, default is False
    """
    tensor_parallel_process_group: int = None
    enable_fused_normalization: bool = False
    enable_all_optimization: bool = False

    # TODO: add support for tensor parallel
    # pipeline_parallel_size: int
    # data_parallel_size: int
    # tensor_parallel_mode: Literal['1d', '2d', '2.5d', '3d']
    # inference_only: bool = True
    # gather_output: bool = True

    @property
    def tensor_parallel_size(self):
        return self._tensor_parallel_size

    def __post_init__(self):
        # get the parallel size
        self._tensor_parallel_size = dist.get_world_size(self.tensor_parallel_process_group)

        # turn on all optimization if all_optimization is set to True
        if self.enable_all_optimization:
            self._turn_on_all_optimization()

    def _turn_on_all_optimization(self):
        """
        Turn on all optimization.
        """
        # you can add all the optimization flag here
        self.fused_layernorm = True
[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00			`from dataclasses import dataclass`
[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`import torch.distributed as dist`
			`from torch.distributed import ProcessGroup`

[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00			`from colossalai.cluster.dist_coordinator import DistCoordinator`
[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00
[shardformer] refactored the user api (#3828) * [shardformer] refactored the user api * polish code 2023-05-24 08:01:26 +00:00			`__all__ = ['ShardConfig']`

[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00
			`@dataclass`
			`class ShardConfig:`
[shardformer] Align bert value (#3907) * add bert align test, fix dist loss bug * forward and backward align * add ignore index * add shardformer CI * add gather_output optional for user in shardconfig * update readme with optional gather_ouput * add dist crossentropy loss test, remove unused files * remove unused file * remove unused file * rename the file * polish code 2023-06-09 06:36:54 +00:00			`r"""`
			`The config for sharding the huggingface model`

			`Args:`
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`tensor_parallel_process_group (int): The process group for tensor parallelism, defaults to None, which is the global process group.`
[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00			`enable_fused_normalization (bool): Whether to use fused layernorm, default is False`
[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00			`"""`
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`tensor_parallel_process_group: int = None`
[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00			`enable_fused_normalization: bool = False`
[shardformer] import huggingface implicitly (#4101) 2023-06-30 02:56:29 +00:00			`enable_all_optimization: bool = False`
[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00
[shardformer] fix bert and gpt downstream with new api (#4024) * fix bert downstream with new api * remove comment line 2023-06-19 02:47:16 +00:00			`# TODO: add support for tensor parallel`
			`# pipeline_parallel_size: int`
			`# data_parallel_size: int`
[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00			`# tensor_parallel_mode: Literal['1d', '2d', '2.5d', '3d']`
			`# inference_only: bool = True`
			`# gather_output: bool = True`

[shardformer] import huggingface implicitly (#4101) 2023-06-30 02:56:29 +00:00			`@property`
			`def tensor_parallel_size(self):`
			`return self._tensor_parallel_size`

[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00			`def __post_init__(self):`
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`# get the parallel size`
[shardformer] import huggingface implicitly (#4101) 2023-06-30 02:56:29 +00:00			`self._tensor_parallel_size = dist.get_world_size(self.tensor_parallel_process_group)`

			`# turn on all optimization if all_optimization is set to True`
			`if self.enable_all_optimization:`
			`self._turn_on_all_optimization()`

			`def _turn_on_all_optimization(self):`
			`"""`
			`Turn on all optimization.`
			`"""`
			`# you can add all the optimization flag here`
			`self.fused_layernorm = True`