ColossalAI/colossalai/shardformer/shard/shard_config.py

from dataclasses import dataclass

import torch.distributed as dist
from torch.distributed import ProcessGroup

__all__ = ['ShardConfig']


@dataclass
class ShardConfig:
    r"""
    The config for sharding the huggingface model

    Args:
        tensor_parallel_process_group (int): The process group for tensor parallelism, defaults to None, which is the global process group.
        enable_tensor_parallelism (bool): Whether to turn on tensor parallelism, default is True.
        enable_fused_normalization (bool): Whether to use fused layernorm, default is False.
        enable_all_optimization (bool): Whether to turn on all optimization, default is False.
    """
    tensor_parallel_process_group: ProcessGroup = None
    enable_tensor_parallelism: bool = True
    enable_fused_normalization: bool = False
    enable_all_optimization: bool = False

    # TODO: add support for tensor parallel
    # pipeline_parallel_size: int
    # data_parallel_size: int
    # tensor_parallel_mode: Literal['1d', '2d', '2.5d', '3d']
    # inference_only: bool = True
    # gather_output: bool = True

    @property
    def tensor_parallel_size(self):
        return self._tensor_parallel_size

    def __post_init__(self):
        if not self.enable_tensor_parallelism:
            self._tensor_parallel_size = 1
        else:
            # get the parallel size
            self._tensor_parallel_size = dist.get_world_size(self.tensor_parallel_process_group)

        # turn on all optimization if all_optimization is set to True
        if self.enable_all_optimization:
            self._turn_on_all_optimization()

    def _turn_on_all_optimization(self):
        """
        Turn on all optimization.
        """
        # you can add all the optimization flag here
        self.enable_fused_normalization = True
[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00			`from dataclasses import dataclass`
[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`import torch.distributed as dist`
			`from torch.distributed import ProcessGroup`

[shardformer] refactored the user api (#3828) * [shardformer] refactored the user api * polish code 2023-05-24 08:01:26 +00:00			`__all__ = ['ShardConfig']`

[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00
			`@dataclass`
			`class ShardConfig:`
[shardformer] Align bert value (#3907) * add bert align test, fix dist loss bug * forward and backward align * add ignore index * add shardformer CI * add gather_output optional for user in shardconfig * update readme with optional gather_ouput * add dist crossentropy loss test, remove unused files * remove unused file * remove unused file * rename the file * polish code 2023-06-09 06:36:54 +00:00			`r"""`
			`The config for sharding the huggingface model`

			`Args:`
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`tensor_parallel_process_group (int): The process group for tensor parallelism, defaults to None, which is the global process group.`
[shardformer] made tensor parallelism configurable (#4144) * [shardformer] made tensor parallelism configurable * polish code 2023-07-04 01:57:03 +00:00			`enable_tensor_parallelism (bool): Whether to turn on tensor parallelism, default is True.`
[shardformer] refactored some doc and api (#4137) * [shardformer] refactored some doc and api * polish code 2023-07-03 07:29:11 +00:00			`enable_fused_normalization (bool): Whether to use fused layernorm, default is False.`
			`enable_all_optimization (bool): Whether to turn on all optimization, default is False.`
[shardformer] init shardformer code structure (#3731) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example 2023-05-22 07:02:17 +00:00			`"""`
[shardformer] write an shardformer example with bert finetuning (#4126) * [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer 2023-06-30 08:48:29 +00:00			`tensor_parallel_process_group: ProcessGroup = None`
[shardformer] made tensor parallelism configurable (#4144) * [shardformer] made tensor parallelism configurable * polish code 2023-07-04 01:57:03 +00:00			`enable_tensor_parallelism: bool = True`
[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00			`enable_fused_normalization: bool = False`
[shardformer] import huggingface implicitly (#4101) 2023-06-30 02:56:29 +00:00			`enable_all_optimization: bool = False`
[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00
[shardformer] fix bert and gpt downstream with new api (#4024) * fix bert downstream with new api * remove comment line 2023-06-19 02:47:16 +00:00			`# TODO: add support for tensor parallel`
			`# pipeline_parallel_size: int`
			`# data_parallel_size: int`
[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00			`# tensor_parallel_mode: Literal['1d', '2d', '2.5d', '3d']`
			`# inference_only: bool = True`
			`# gather_output: bool = True`

[shardformer] import huggingface implicitly (#4101) 2023-06-30 02:56:29 +00:00			`@property`
			`def tensor_parallel_size(self):`
			`return self._tensor_parallel_size`

[shardformer] adapted llama to the new API (#4036) 2023-06-19 05:53:17 +00:00			`def __post_init__(self):`
[shardformer] made tensor parallelism configurable (#4144) * [shardformer] made tensor parallelism configurable * polish code 2023-07-04 01:57:03 +00:00			`if not self.enable_tensor_parallelism:`
			`self._tensor_parallel_size = 1`
			`else:`
			`# get the parallel size`
			`self._tensor_parallel_size = dist.get_world_size(self.tensor_parallel_process_group)`
[shardformer] import huggingface implicitly (#4101) 2023-06-30 02:56:29 +00:00
			`# turn on all optimization if all_optimization is set to True`
			`if self.enable_all_optimization:`
			`self._turn_on_all_optimization()`

			`def _turn_on_all_optimization(self):`
			`"""`
			`Turn on all optimization.`
			`"""`
			`# you can add all the optimization flag here`
[shardformer] refactored some doc and api (#4137) * [shardformer] refactored some doc and api * polish code 2023-07-03 07:29:11 +00:00			`self.enable_fused_normalization = True`