ColossalAI/colossalai/tensor/colo_tensor.py

import torch
from .op_wrapper import _COLOSSAL_OPS
from typing import Tuple


class ColoTensor(object):
    """ Data Structure for Tensor in Colossal-AI
    1. It contains a torch.Tensor as an attribute.
    2. It supports lazy init the tensor's payload.
    3. It can hijack the torch functions which using ColoTensors as args to our customized functions.
    4. It supports distributing the tensor's payload to the shards among processes. (TODO)
    """

    def __new__(cls, *args, **kwargs):
        return super(ColoTensor, cls).__new__(cls)

    def __init__(
            self,
            *size: Tuple[int],
            dtype=None,
            requires_grad=False,
            pin_memory=False,
            device=None,
            torch_tensor=torch.empty(0),
    ):
        self._size = size
        self._dtype = dtype
        self._requires_grad = requires_grad
        self._pin_memory = pin_memory
        self._device = device
        self._torch_tensor = torch_tensor

    def numel(self):
        return sum(self._size)

    @staticmethod
    def init_from_torch_tensor(tensor: torch.Tensor, save_payload=True) -> 'ColoTensor':
        colo_t = ColoTensor(*tensor.size(),
                            dtype=tensor.dtype,
                            requires_grad=tensor.requires_grad,
                            pin_memory=tensor.is_pinned(),
                            device=tensor.device,
                            torch_tensor=tensor if save_payload else torch.empty(0))
        return colo_t

    def del_torch_tensor(self) -> None:
        self._size = (0,)
        self._torch_tensor = torch.empty(self._size)

    def torch_tensor(self) -> torch.Tensor:
        if self._torch_tensor.numel() == 0:
            self._torch_tensor = torch.empty(*self._size,
                                             dtype=self._dtype,
                                             pin_memory=self._pin_memory,
                                             requires_grad=self._requires_grad,
                                             device=self._device)
        return self._torch_tensor

    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        global _COLOSSAL_OPS
        if func in _COLOSSAL_OPS:
            for arg in args:
                if isinstance(arg, ColoTensor):
                    return _COLOSSAL_OPS[func](types, args, kwargs, None)

            for kwarg in kwargs.values():
                if isinstance(kwarg, ColoTensor):
                    return _COLOSSAL_OPS[func](types, args, kwargs, None)
        else:
            # If we have not hijact the function, convert the ColoTensors in args and kwargs to torch tensors.
            args = [arg.torch_tensor() if isinstance(arg, ColoTensor) else arg for arg in args]
            if kwargs is None:
                kwargs = {}

            kwargs = {k: v.torch_tensor() if isinstance(v, ColoTensor) else v for k, v in kwargs.items()}
            return func(*args, **kwargs)
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00			`import torch`
[tensor] reorganize files (#820) 2022-04-21 06:15:48 +00:00			`from .op_wrapper import _COLOSSAL_OPS`
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)" (#835) This reverts commit ac88de6dfc69bc59d4cadbd6432b0b818ca37e60. 2022-04-22 06:45:57 +00:00			`from typing import Tuple`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00

[tensor] reorganize files (#820) 2022-04-21 06:15:48 +00:00			`class ColoTensor(object):`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`""" Data Structure for Tensor in Colossal-AI`
			`1. It contains a torch.Tensor as an attribute.`
			`2. It supports lazy init the tensor's payload.`
			`3. It can hijack the torch functions which using ColoTensors as args to our customized functions.`
			`4. It supports distributing the tensor's payload to the shards among processes. (TODO)`
			`"""`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00
			`def __new__(cls, args, *kwargs):`
[tensor] reorganize files (#820) 2022-04-21 06:15:48 +00:00			`return super(ColoTensor, cls).__new__(cls)`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`def __init__(`
[tensor] ZeRO use ColoTensor as the base class. (#828) * [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. * [tensor] ZeRO use ColoTensor as the base class. * polish 2022-04-22 04:00:48 +00:00			`self,`
			`*size: Tuple[int],`
			`dtype=None,`
			`requires_grad=False,`
			`pin_memory=False,`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`device=None,`
[tensor] ZeRO use ColoTensor as the base class. (#828) * [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. * [tensor] ZeRO use ColoTensor as the base class. * polish 2022-04-22 04:00:48 +00:00			`torch_tensor=torch.empty(0),`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`):`
			`self._size = size`
			`self._dtype = dtype`
			`self._requires_grad = requires_grad`
			`self._pin_memory = pin_memory`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`self._device = device`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`self._torch_tensor = torch_tensor`

[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`def numel(self):`
			`return sum(self._size)`

[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`@staticmethod`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`def init_from_torch_tensor(tensor: torch.Tensor, save_payload=True) -> 'ColoTensor':`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`colo_t = ColoTensor(*tensor.size(),`
			`dtype=tensor.dtype,`
			`requires_grad=tensor.requires_grad,`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`pin_memory=tensor.is_pinned(),`
			`device=tensor.device,`
			`torch_tensor=tensor if save_payload else torch.empty(0))`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`return colo_t`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00
[tensor] ZeRO use ColoTensor as the base class. (#828) * [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. * [tensor] ZeRO use ColoTensor as the base class. * polish 2022-04-22 04:00:48 +00:00			`def del_torch_tensor(self) -> None:`
			`self._size = (0,)`
			`self._torch_tensor = torch.empty(self._size)`

[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00			`def torch_tensor(self) -> torch.Tensor:`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`if self._torch_tensor.numel() == 0:`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`self._torch_tensor = torch.empty(*self._size,`
			`dtype=self._dtype,`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`pin_memory=self._pin_memory,`
[tensor] lazy init (#823) 2022-04-21 07:40:23 +00:00			`requires_grad=self._requires_grad,`
[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`device=self._device)`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00			`return self._torch_tensor`

			`@classmethod`
			`def __torch_function__(cls, func, types, args=(), kwargs=None):`
[tensor] reorganize files (#820) 2022-04-21 06:15:48 +00:00			`global _COLOSSAL_OPS`
			`if func in _COLOSSAL_OPS:`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00			`for arg in args:`
[tensor] reorganize files (#820) 2022-04-21 06:15:48 +00:00			`if isinstance(arg, ColoTensor):`
			`return _COLOSSAL_OPS[func](types, args, kwargs, None)`
[gemini] a new tensor structure (#818) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish 2022-04-21 03:42:37 +00:00
			`for kwarg in kwargs.values():`
[tensor] reorganize files (#820) 2022-04-21 06:15:48 +00:00			`if isinstance(kwarg, ColoTensor):`
			`return _COLOSSAL_OPS[func](types, args, kwargs, None)`
[Tensor] update ColoTensor torch_function (#822) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * [tensor] renaming and reorganize directory structure. * rm useless dir * polish * polish * [tensor] hander the function not wrapped * polish 2022-04-21 06:25:27 +00:00			`else:`
			`# If we have not hijact the function, convert the ColoTensors in args and kwargs to torch tensors.`
			`args = [arg.torch_tensor() if isinstance(arg, ColoTensor) else arg for arg in args]`
			`if kwargs is None:`
			`kwargs = {}`

[hotfix] ColoTensor pin_memory (#840) 2022-04-22 09:07:46 +00:00			`kwargs = {k: v.torch_tensor() if isinstance(v, ColoTensor) else v for k, v in kwargs.items()}`
[Tensor] update ColoTensor torch_function (#822) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit 88759e289efd0a7b5e0d7bf8e01dbe29db85cf71. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * [tensor] renaming and reorganize directory structure. * rm useless dir * polish * polish * [tensor] hander the function not wrapped * polish 2022-04-21 06:25:27 +00:00			`return func(args, *kwargs)`