ColossalAI/colossalai/legacy/nn/layer/wrapper/pipeline_wrapper.py

from typing import List, Tuple, Union

import torch.distributed as dist
import torch.nn as nn

from colossalai.legacy.context import ParallelMode
from colossalai.legacy.core import global_context as gpc


class PipelineSharedModuleWrapper:
    def __init__(self, pipeline_ranks: Union[List[int], Tuple[int]]) -> None:
        assert len(pipeline_ranks) > 1, f"Expect len(pipeline_ranks) > 1, got {len(pipeline_ranks)}"
        self.pipeline_ranks = pipeline_ranks
        self.group = None
        self.ranks_in_group = None
        self._init_group()

    def _init_group(self):
        world_size = gpc.get_world_size(ParallelMode.GLOBAL)
        dp_size = gpc.get_world_size(ParallelMode.DATA)
        pp_size = gpc.get_world_size(ParallelMode.PIPELINE)
        rank = gpc.get_global_rank()
        num_dp_groups = world_size // dp_size
        num_pp_stages = num_dp_groups // pp_size
        for i in range(dp_size):
            for j in range(num_pp_stages):
                pipeline_ranks = list(range(i * num_dp_groups + j, (i + 1) * num_dp_groups, num_pp_stages))
                sub_ranks = [pipeline_ranks[idx] for idx in self.pipeline_ranks]
                group = dist.new_group(sub_ranks)
                if rank in sub_ranks:
                    self.group = group
                    self.ranks_in_group = sub_ranks

    def register_module(self, module: nn.Module):
        assert (
            self.ranks_in_group is not None
        ), f"Rank {gpc.get_local_rank(ParallelMode.PIPELINE)} is not in pipeline_ranks {self.pipeline_ranks}"
        src = self.ranks_in_group[self.pipeline_ranks[0]]
        for p in module.parameters():
            setattr(p, "pipeline_shared_module_pg", self.group)
            dist.broadcast(p, src, group=self.group)

    def register_parameter(self, param: nn.Parameter):
        assert (
            self.ranks_in_group is not None
        ), f"Rank {gpc.get_local_rank(ParallelMode.PIPELINE)} is not in pipeline_ranks {self.pipeline_ranks}"
        src = self.ranks_in_group[self.pipeline_ranks[0]]
        setattr(param, "pipeline_shared_module_pg", self.group)
        dist.broadcast(param, src, group=self.group)
Optimize pipeline schedule (#94) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com> 2021-12-30 07:56:46 +00:00			`from typing import List, Tuple, Union`
[legacy] move communication and nn to legacy and refactor logger (#4671) * [legacy] move communication to legacy (#4640) * [legacy] refactor logger and clean up legacy codes (#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check 2023-09-11 08:24:28 +00:00
			`import torch.distributed as dist`
			`import torch.nn as nn`

[legacy] clean up legacy code (#4743) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci 2023-09-18 08:31:06 +00:00			`from colossalai.legacy.context import ParallelMode`
			`from colossalai.legacy.core import global_context as gpc`
Optimize pipeline schedule (#94) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com> 2021-12-30 07:56:46 +00:00

			`class PipelineSharedModuleWrapper:`
			`def __init__(self, pipeline_ranks: Union[List[int], Tuple[int]]) -> None:`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`assert len(pipeline_ranks) > 1, f"Expect len(pipeline_ranks) > 1, got {len(pipeline_ranks)}"`
Optimize pipeline schedule (#94) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com> 2021-12-30 07:56:46 +00:00			`self.pipeline_ranks = pipeline_ranks`
			`self.group = None`
			`self.ranks_in_group = None`
			`self._init_group()`

			`def _init_group(self):`
			`world_size = gpc.get_world_size(ParallelMode.GLOBAL)`
			`dp_size = gpc.get_world_size(ParallelMode.DATA)`
			`pp_size = gpc.get_world_size(ParallelMode.PIPELINE)`
			`rank = gpc.get_global_rank()`
			`num_dp_groups = world_size // dp_size`
			`num_pp_stages = num_dp_groups // pp_size`
			`for i in range(dp_size):`
			`for j in range(num_pp_stages):`
[NFC] polish colossalai/nn/layer/wrapper/pipeline_wrapper.py code style (#1303) 2022-07-13 11:01:07 +00:00			`pipeline_ranks = list(range(i * num_dp_groups + j, (i + 1) * num_dp_groups, num_pp_stages))`
Optimize pipeline schedule (#94) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com> 2021-12-30 07:56:46 +00:00			`sub_ranks = [pipeline_ranks[idx] for idx in self.pipeline_ranks]`
			`group = dist.new_group(sub_ranks)`
			`if rank in sub_ranks:`
			`self.group = group`
			`self.ranks_in_group = sub_ranks`

			`def register_module(self, module: nn.Module):`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`assert (`
			`self.ranks_in_group is not None`
			`), f"Rank {gpc.get_local_rank(ParallelMode.PIPELINE)} is not in pipeline_ranks {self.pipeline_ranks}"`
Optimize pipeline schedule (#94) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com> 2021-12-30 07:56:46 +00:00			`src = self.ranks_in_group[self.pipeline_ranks[0]]`
			`for p in module.parameters():`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`setattr(p, "pipeline_shared_module_pg", self.group)`
Optimize pipeline schedule (#94) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com> 2021-12-30 07:56:46 +00:00			`dist.broadcast(p, src, group=self.group)`
fix layers/schedule for hybrid parallelization (#111) (#112) 2022-01-04 12:52:31 +00:00
			`def register_parameter(self, param: nn.Parameter):`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`assert (`
			`self.ranks_in_group is not None`
			`), f"Rank {gpc.get_local_rank(ParallelMode.PIPELINE)} is not in pipeline_ranks {self.pipeline_ranks}"`
fix layers/schedule for hybrid parallelization (#111) (#112) 2022-01-04 12:52:31 +00:00			`src = self.ranks_in_group[self.pipeline_ranks[0]]`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`setattr(param, "pipeline_shared_module_pg", self.group)`
fix layers/schedule for hybrid parallelization (#111) (#112) 2022-01-04 12:52:31 +00:00			`dist.broadcast(param, src, group=self.group)`