ColossalAI/colossalai/shardformer/layer/normalization.py

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

import torch
import torch.nn as nn

from colossalai.lazy import LazyInitContext

__all__ = ['FusedLayerNorm', 'FusedRMSNorm']

FAST_LAYERNORM_SUPPORTED_SIZE = [
    1024, 1536, 2048, 2304, 3072, 3840, 4096, 5120, 6144, 8192, 10240, 12288, 12800, 15360, 16384, 18432, 20480, 24576,
    25600, 30720, 32768, 40960, 49152, 65536
]


class FusedLayerNorm():
    r"""
    This is a wrapper around the apex fused layernorm implementation. It is meant to be used only with the from_native_module interface.
    """

    def __init__(self) -> None:
        raise NotImplementedError(
            'FusedLayerNorm is not implemented as a physical class. '
            'It is meant to be used only with the from_native_module interface to wrap the fused layernorm implementation provided by apex.'
        )

    @staticmethod
    def from_native_module(module: nn.LayerNorm, *args, **kwargs) -> nn.Module:
        r"""
        Convert a native pytorch layer norm module to colossalai layer norm module
        """
        # check if apex is installed
        try:
            import apex
        except ImportError:
            raise ImportError(
                'Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel')

        LazyInitContext.materialize(module)
        # get the attributes of the module
        normalized_shape = module.normalized_shape
        eps = module.eps
        elementwise_affine = module.elementwise_affine
        dtype = module.weight.dtype
        device = module.weight.device

        # pick the suitable layernorm implementation
        use_fast_ln = normalized_shape in FAST_LAYERNORM_SUPPORTED_SIZE

        if use_fast_ln:
            try:
                from apex.contrib.layer_norm.layer_norm import FastLayerNorm as ApexFusedLayerNorm
            except ImportError:
                # fall back to the normal fused layernorm is not built
                from apex.normalization import FusedLayerNorm as ApexFusedLayerNorm
        else:
            from apex.normalization import FusedLayerNorm as ApexFusedLayerNorm

        layernorm = ApexFusedLayerNorm(normalized_shape, eps=eps,
                                       elementwise_affine=elementwise_affine).to(dtype).to(device)

        layernorm.weight = module.weight
        layernorm.bias = module.bias
        return layernorm


class FusedRMSNorm():
    """
    This is a wrapper around the apex fused rms norm implementation. It is meant to be used only with the from_native_module interface.
    """

    def __init__(self) -> None:
        raise NotImplementedError(
            'FusedRMSNorm is not implemented as a physical class. '
            'It is meant to be used only with the from_native_module interface to wrap the fused rms norm implementation provided by apex.'
        )

    @staticmethod
    def from_native_module(module: nn.Module, *args, **kwargs) -> nn.Module:
        try:
            from apex.normalization import FusedRMSNorm as ApexFusedRMSNorm
        except ImportError:
            raise ImportError(
                'Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMS normalization kernel'
            )

        LazyInitContext.materialize(module)
        # to check if it is huggingface LlamaRMSNorm
        if module.__class__.__name__ == "LlamaRMSNorm":
            normalized_shape = module.weight.shape[0]
            eps = module.variance_epsilon
            elementwise_affine = True
        else:
            # get the attributes of the module
            normalized_shape = module.normalized_shape
            eps = module.eps
            elementwise_affine = module.elementwise_affine

        rmsnorm = ApexFusedRMSNorm(normalized_shape=normalized_shape, eps=eps, elementwise_affine=elementwise_affine)

        rmsnorm.weight = module.weight

        return rmsnorm
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00			`#!/usr/bin/env python`
			`# -- encoding: utf-8 --`

			`import torch`
			`import torch.nn as nn`

[shardformer] support lazy init (#4202) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test 2023-07-10 02:48:53 +00:00			`from colossalai.lazy import LazyInitContext`

[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00			`__all__ = ['FusedLayerNorm', 'FusedRMSNorm']`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`FAST_LAYERNORM_SUPPORTED_SIZE = [`
			`1024, 1536, 2048, 2304, 3072, 3840, 4096, 5120, 6144, 8192, 10240, 12288, 12800, 15360, 16384, 18432, 20480, 24576,`
			`25600, 30720, 32768, 40960, 49152, 65536`
			`]`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00

[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`class FusedLayerNorm():`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00			`r"""`
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`This is a wrapper around the apex fused layernorm implementation. It is meant to be used only with the from_native_module interface.`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00			`"""`

[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`def __init__(self) -> None:`
			`raise NotImplementedError(`
			`'FusedLayerNorm is not implemented as a physical class. '`
			`'It is meant to be used only with the from_native_module interface to wrap the fused layernorm implementation provided by apex.'`
			`)`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00
			`@staticmethod`
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`def from_native_module(module: nn.LayerNorm, args, *kwargs) -> nn.Module:`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00			`r"""`
			`Convert a native pytorch layer norm module to colossalai layer norm module`
			`"""`
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`# check if apex is installed`
			`try:`
			`import apex`
			`except ImportError:`
			`raise ImportError(`
			`'Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel')`

[shardformer] support lazy init (#4202) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test 2023-07-10 02:48:53 +00:00			`LazyInitContext.materialize(module)`
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`# get the attributes of the module`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00			`normalized_shape = module.normalized_shape`
			`eps = module.eps`
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`elementwise_affine = module.elementwise_affine`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00			`dtype = module.weight.dtype`
			`device = module.weight.device`

[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`# pick the suitable layernorm implementation`
			`use_fast_ln = normalized_shape in FAST_LAYERNORM_SUPPORTED_SIZE`

			`if use_fast_ln:`
			`try:`
			`from apex.contrib.layer_norm.layer_norm import FastLayerNorm as ApexFusedLayerNorm`
			`except ImportError:`
			`# fall back to the normal fused layernorm is not built`
			`from apex.normalization import FusedLayerNorm as ApexFusedLayerNorm`
			`else:`
			`from apex.normalization import FusedLayerNorm as ApexFusedLayerNorm`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00
[shardformer] refactored layernorm (#4086) 2023-06-26 10:05:00 +00:00			`layernorm = ApexFusedLayerNorm(normalized_shape, eps=eps,`
			`elementwise_affine=elementwise_affine).to(dtype).to(device)`
[shardformer] Add layernorm (#4072) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm 2023-06-23 10:00:22 +00:00
[shardformer] support inplace sharding (#4251) * [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version 2023-07-20 02:39:06 +00:00			`layernorm.weight = module.weight`
			`layernorm.bias = module.bias`
[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00			`return layernorm`


			`class FusedRMSNorm():`
			`"""`
			`This is a wrapper around the apex fused rms norm implementation. It is meant to be used only with the from_native_module interface.`
			`"""`

			`def __init__(self) -> None:`
			`raise NotImplementedError(`
			`'FusedRMSNorm is not implemented as a physical class. '`
			`'It is meant to be used only with the from_native_module interface to wrap the fused rms norm implementation provided by apex.'`
			`)`

			`@staticmethod`
			`def from_native_module(module: nn.Module, args, *kwargs) -> nn.Module:`
			`try:`
			`from apex.normalization import FusedRMSNorm as ApexFusedRMSNorm`
			`except ImportError:`
			`raise ImportError(`
			`'Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMS normalization kernel'`
			`)`

[shardformer] support lazy init (#4202) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test 2023-07-10 02:48:53 +00:00			`LazyInitContext.materialize(module)`
[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00			`# to check if it is huggingface LlamaRMSNorm`
			`if module.__class__.__name__ == "LlamaRMSNorm":`
			`normalized_shape = module.weight.shape[0]`
			`eps = module.variance_epsilon`
			`elementwise_affine = True`
			`else:`
			`# get the attributes of the module`
			`normalized_shape = module.normalized_shape`
			`eps = module.eps`
			`elementwise_affine = module.elementwise_affine`

			`rmsnorm = ApexFusedRMSNorm(normalized_shape=normalized_shape, eps=eps, elementwise_affine=elementwise_affine)`

[shardformer] support inplace sharding (#4251) * [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version 2023-07-20 02:39:06 +00:00			`rmsnorm.weight = module.weight`
[shardformer] supported fused normalization (#4112) 2023-06-30 01:32:37 +00:00
			`return rmsnorm`