ColossalAI/colossalai/legacy/communication/ring.py

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

import torch

from colossalai.accelerator import get_accelerator
from colossalai.legacy.context.parallel_mode import ParallelMode
from colossalai.legacy.core import global_context as gpc


def ring_forward(tensor_send_next: torch.Tensor, parallel_mode: ParallelMode) -> torch.Tensor:
    """Sends a tensor to the next member and receives a tensor from the previous member.
    This function returns the received tensor from the previous member.

    Args:
        tensor_send_next (:class:`torch.Tensor`): Tensor sent to next member
        parallel_mode (ParallelMode): Parallel group mode used in this communication

    Returns:
        :class:`torch.Tensor`: The tensor received from the previous.

    Note:
        The parallel_mode should be concluded in ``ParallelMode``. More details about ``ParallelMode`` could be found
        in `parallel_mode <https://github.com/hpcaitech/ColossalAI/blob/main/colossalai/context/parallel_mode.py>`_.
    """
    buffer_shape = tensor_send_next.size()

    ops = []
    current_rank = gpc.get_global_rank()

    tensor_recv_prev = torch.empty(
        buffer_shape, requires_grad=True, device=get_accelerator().get_current_device(), dtype=tensor_send_next.dtype
    )

    # send to next rank
    send_next_op = torch.distributed.P2POp(
        torch.distributed.isend, tensor_send_next, gpc.get_next_global_rank(parallel_mode)
    )
    ops.append(send_next_op)

    # receive from prev rank
    recv_prev_op = torch.distributed.P2POp(
        torch.distributed.irecv, tensor_recv_prev, gpc.get_prev_global_rank(parallel_mode)
    )
    ops.append(recv_prev_op)

    if current_rank % 2 == 0:
        ops = ops[::-1]

    reqs = torch.distributed.batch_isend_irecv(ops)
    for req in reqs:
        req.wait()

    # To protect against race condition when using batch_isend_irecv().
    get_accelerator().synchronize()

    return tensor_recv_prev
Migrated project 3 years ago			`#!/usr/bin/env python`
			`# -- encoding: utf-8 --`

			`import torch`

[npu] change device to accelerator api (#5239) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com> 11 months ago			`from colossalai.accelerator import get_accelerator`
[legacy] clean up legacy code (#4743) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci 1 year ago			`from colossalai.legacy.context.parallel_mode import ParallelMode`
			`from colossalai.legacy.core import global_context as gpc`
Migrated project 3 years ago

[doc] improved docstring in the communication module (#863) 3 years ago			`def ring_forward(tensor_send_next: torch.Tensor, parallel_mode: ParallelMode) -> torch.Tensor:`
Refactored docstring to google style 3 years ago			`"""Sends a tensor to the next member and receives a tensor from the previous member.`
			`This function returns the received tensor from the previous member.`

			`Args:`
[doc] improved docstring in the communication module (#863) 3 years ago			tensor_send_next (:class:`torch.Tensor`): Tensor sent to next member
			`parallel_mode (ParallelMode): Parallel group mode used in this communication`
Refactored docstring to google style 3 years ago
			`Returns:`
			:class:`torch.Tensor`: The tensor received from the previous.

			`Note:`
			The parallel_mode should be concluded in ``ParallelMode``. More details about ``ParallelMode`` could be found
			in `parallel_mode <https://github.com/hpcaitech/ColossalAI/blob/main/colossalai/context/parallel_mode.py>`_.
Migrated project 3 years ago			`"""`
			`buffer_shape = tensor_send_next.size()`

			`ops = []`
			`current_rank = gpc.get_global_rank()`

[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`tensor_recv_prev = torch.empty(`
[npu] change device to accelerator api (#5239) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com> 11 months ago			`buffer_shape, requires_grad=True, device=get_accelerator().get_current_device(), dtype=tensor_send_next.dtype`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`)`
Migrated project 3 years ago
			`# send to next rank`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`send_next_op = torch.distributed.P2POp(`
			`torch.distributed.isend, tensor_send_next, gpc.get_next_global_rank(parallel_mode)`
			`)`
Migrated project 3 years ago			`ops.append(send_next_op)`

			`# receive from prev rank`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`recv_prev_op = torch.distributed.P2POp(`
			`torch.distributed.irecv, tensor_recv_prev, gpc.get_prev_global_rank(parallel_mode)`
			`)`
Migrated project 3 years ago			`ops.append(recv_prev_op)`

			`if current_rank % 2 == 0:`
			`ops = ops[::-1]`

			`reqs = torch.distributed.batch_isend_irecv(ops)`
			`for req in reqs:`
			`req.wait()`

			`# To protect against race condition when using batch_isend_irecv().`
[npu] change device to accelerator api (#5239) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com> 11 months ago			`get_accelerator().synchronize()`
Migrated project 3 years ago
			`return tensor_recv_prev`