ColossalAI/colossalai/nn/layer/colossalai_layer/normalization.py

from colossalai.utils import get_current_device
from torch import nn
from colossalai import kernel

from ... import init as init
from ..parallel_1d import *
from ..parallel_2d import *
from ..parallel_2p5d import *
from ..parallel_3d import *
from ..utils import get_tensor_parallel_mode
from ..vanilla import *

_parallel_layernorm = {
    '1d': kernel.LayerNorm,
    '2d': LayerNorm2D,
    '2.5d': LayerNorm2p5D,
    '3d': LayerNorm3D
}


class LayerNorm(nn.Module):
    r"""
    Layer Normalization for colossalai

    :param normalized_shape: input shape from an expected input
        of size. :math:`[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]`
        If a single integer is used, it is treated as a singleton list, and this module will
        normalize over the last dimension which is expected to be of that specific size.
    :type normalized_shape: int
    :param eps: a value added to the denominator for numerical stability, defaults to 1e-05
    :type eps: float, optional
    :param dtype: The dtype of parameters, defaults to None
    :type dtype: torch.dtype, optional
    """

    def __init__(self, normalized_shape: int, eps=1e-05, dtype=None) -> None:
        super().__init__()
        tensor_parallel = get_tensor_parallel_mode()
        if tensor_parallel is None:
            self.norm = nn.LayerNorm(normalized_shape, eps=eps).to(dtype).to(get_current_device())
        else:
            self.norm = _parallel_layernorm[tensor_parallel](normalized_shape, eps=eps, dtype=dtype)

    @property
    def weight(self):
        return self.norm.weight

    @property
    def bias(self):
        return self.norm.bias

    def forward(self, *args):
        return self.norm(*args)
Hotfix/Colossalai layers (#92) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> 3 years ago			`from colossalai.utils import get_current_device`
			`from torch import nn`
moved env variables to global variables; (#215) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs 3 years ago			`from colossalai import kernel`
Hotfix/Colossalai layers (#92) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> 3 years ago
			`from ... import init as init`
			`from ..parallel_1d import *`
			`from ..parallel_2d import *`
			`from ..parallel_2p5d import *`
			`from ..parallel_3d import *`
			`from ..utils import get_tensor_parallel_mode`
			`from ..vanilla import *`

moved env variables to global variables; (#215) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs 3 years ago			`_parallel_layernorm = {`
			`'1d': kernel.LayerNorm,`
			`'2d': LayerNorm2D,`
			`'2.5d': LayerNorm2p5D,`
			`'3d': LayerNorm3D`
			`}`
Hotfix/Colossalai layers (#92) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> 3 years ago

			`class LayerNorm(nn.Module):`
Update layer integration documentations (#108) Update the documentations of layer integration Update _log_hook.py Update _operation.py 3 years ago			`r"""`
			`Layer Normalization for colossalai`

			`:param normalized_shape: input shape from an expected input`
			of size. :math:`[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]`
			`If a single integer is used, it is treated as a singleton list, and this module will`
			`normalize over the last dimension which is expected to be of that specific size.`
			`:type normalized_shape: int`
			`:param eps: a value added to the denominator for numerical stability, defaults to 1e-05`
			`:type eps: float, optional`
			`:param dtype: The dtype of parameters, defaults to None`
			`:type dtype: torch.dtype, optional`
			`"""`
moved env variables to global variables; (#215) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs 3 years ago
Hotfix/Colossalai layers (#92) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> 3 years ago			`def __init__(self, normalized_shape: int, eps=1e-05, dtype=None) -> None:`
			`super().__init__()`
			`tensor_parallel = get_tensor_parallel_mode()`
moved env variables to global variables; (#215) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs 3 years ago			`if tensor_parallel is None:`
			`self.norm = nn.LayerNorm(normalized_shape, eps=eps).to(dtype).to(get_current_device())`
Hotfix/Colossalai layers (#92) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> 3 years ago			`else:`
			`self.norm = _parallel_layernorm[tensor_parallel](normalized_shape, eps=eps, dtype=dtype)`

			`@property`
			`def weight(self):`
			`return self.norm.weight`

			`@property`
			`def bias(self):`
			`return self.norm.bias`

			`def forward(self, *args):`
			`return self.norm(*args)`