ColossalAI/colossalai/shardformer/shard/shardformer.py

from typing import Dict, List, Tuple

import torch.distributed as dist
import torch.nn as nn
from torch import Tensor

from colossalai.cluster import DistCoordinator

from ..policies.base_policy import Policy
from .shard_config import ShardConfig
from .sharder import ModelSharder


class ShardFormer:
    """
    Parallelize model based on the given config and policy

    Example:

    ```python
    from colossalai.shardformer import ShardFormer, ShardConfig
    from transformers import BertForMaskedLM
    import colossalai
    import torch

    colossalai.launch_from_torch()

    org_model = BertForMaskedLM.from_pretrained('bert-base-uncased')
    shard_config = ShardConfig()
    shard_former = ShardFormer(shard_config=shard_config)
    model, shared_params = shard_former.optimize(org_model)
    ```
    """

    def __init__(self, shard_config: ShardConfig):
        self.is_distributed = dist.is_initialized()
        if self.is_distributed:
            self.coordinator = DistCoordinator()
        else:
            self.coordinator = None
        self.shard_config = shard_config

    def optimize(self, model: nn.Module, policy: Policy = None) -> Tuple[nn.Module, List[Dict[int, Tensor]]]:
        r"""
        This method will optimize the model based on the given policy.

        Args:
            model (`torch.nn.Model`): the origin huggingface model
            shard_config (`ShardConfig`): the config for distribute information
            policy (`Policy`): the custom policy for sharding

        Returns: the sharded model and the shared parameters
        """
        sharder = ModelSharder(model=model, shard_config=self.shard_config, policy=policy)
        shared_params = sharder.shard()
        return model, shared_params
[pipeline] update shardformer docstring 2023-07-05 06:19:12 +00:00			`from typing import Dict, List, Tuple`

[Inference] ADD async and sync Api server using FastAPI (#5396) * add api server * fix * add * add completion service and fix bug * add generation config * revise shardformer * fix bugs * add docstrings and fix some bugs * fix bugs and add choices for prompt template 2024-03-01 06:47:36 +00:00			`import torch.distributed as dist`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			`import torch.nn as nn`
[pipeline] update shardformer docstring 2023-07-05 06:19:12 +00:00			`from torch import Tensor`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`from colossalai.cluster import DistCoordinator`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00
[shardformer] rename policy file name 2023-07-05 07:13:00 +00:00			`from ..policies.base_policy import Policy`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			`from .shard_config import ShardConfig`
			`from .sharder import ModelSharder`


			`class ShardFormer:`
			`"""`
			`Parallelize model based on the given config and policy`

			`Example:`

			```python
			`from colossalai.shardformer import ShardFormer, ShardConfig`
			`from transformers import BertForMaskedLM`
			`import colossalai`
			`import torch`

[misc] refactor launch API and tensor constructor (#5666) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2024-04-29 02:40:11 +00:00			`colossalai.launch_from_torch()`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00
			`org_model = BertForMaskedLM.from_pretrained('bert-base-uncased')`
[shardformer] refactored some doc and api (#4137) * [shardformer] refactored some doc and api * polish code 2023-07-03 07:29:11 +00:00			`shard_config = ShardConfig()`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			`shard_former = ShardFormer(shard_config=shard_config)`
[pipeline] update shardformer docstring 2023-07-05 06:19:12 +00:00			`model, shared_params = shard_former.optimize(org_model)`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			```
			`"""`

			`def __init__(self, shard_config: ShardConfig):`
[Inference] ADD async and sync Api server using FastAPI (#5396) * add api server * fix * add * add completion service and fix bug * add generation config * revise shardformer * fix bugs * add docstrings and fix some bugs * fix bugs and add choices for prompt template 2024-03-01 06:47:36 +00:00			`self.is_distributed = dist.is_initialized()`
			`if self.is_distributed:`
			`self.coordinator = DistCoordinator()`
			`else:`
			`self.coordinator = None`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			`self.shard_config = shard_config`

[pipeline] update shardformer docstring 2023-07-05 06:19:12 +00:00			`def optimize(self, model: nn.Module, policy: Policy = None) -> Tuple[nn.Module, List[Dict[int, Tensor]]]:`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			`r"""`
[shardformer] refactored some doc and api (#4137) * [shardformer] refactored some doc and api * polish code 2023-07-03 07:29:11 +00:00			`This method will optimize the model based on the given policy.`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00
			`Args:`
			model (`torch.nn.Model`): the origin huggingface model
			shard_config (`ShardConfig`): the config for distribute information
			policy (`Policy`): the custom policy for sharding
[pipeline] update shardformer docstring 2023-07-05 06:19:12 +00:00
			`Returns: the sharded model and the shared parameters`
[shardformer] Refactor shardformer api (#4001) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review 2023-06-15 09:55:42 +00:00			`"""`
[shardformer] integrate with data parallelism (#4103) 2023-06-30 01:58:08 +00:00			`sharder = ModelSharder(model=model, shard_config=self.shard_config, policy=policy)`
[pipeline] update shardformer policy 2023-07-05 06:16:55 +00:00			`shared_params = sharder.shard()`
			`return model, shared_params`