ColossalAI/docs/source/en/advanced_tutorials/add_your_parallel.md

# Add Your Own Parallel Mode

Author: Shenggui Li, Yongbin Li

**Prerequisite:**
- [Define Your Configuration](../basics/define_your_config.md)
- [Configure Parallelization](../basics/configure_parallelization.md)

## Introduction

To enable researchers and engineers to extend our system to other novel large-scale distributed training algorithm
with less effort, we have decoupled various components in the training lifecycle. You can implement your own
parallelism by simply inheriting from the base class.

The main components are:

1. `ProcessGroupInitializer`
2. `GradientHandler`
3. `Schedule`

**This currently requires some code to the source code, thus we recommend that you install from source with the `-e` flag.
`-e` flag makes the installation editable, thus, your code change will be reflected in your Python runtime.
We will work on this to avoid change to source code in future releases.**


## Process Group Initializer

Parallelism is often managed by process groups where processes involved in the same parallel algorithm are placed in the same
process group. For different parallel algorithms, different process groups need to be created. Colossal-AI provides a
global context for users to easily manage their process groups. If you wish to add new process group, you can easily
define a new class and set it in your configuration file. To define your own way of creating process groups, you can
follow the steps below to create a new distributed initialization.

1. Add your parallel mode in `colossalai.legacy.context.parallel_mode.ParallelMode`.
    ```python
    class ParallelMode(Enum):
        GLOBAL = 'global'
        DATA = 'data'
        PIPELINE = 'pipe'
        ...

        NEW_MODE = 'new_mode'  # define your mode here
    ```

2. Create a `ProcessGroupInitializer`. You can refer to examples given in `colossalai.context.dist_group_initializer`. The
   first six arguments are fixed. `ParallelContext` will pass in these arguments for you. If you need to set other
   arguments, you can add it behind like the `arg1, arg2` in the example below. Lastly, register your initializer to the
   registry by adding the decorator `@DIST_GROUP_INITIALIZER.register_module`.
    ```python
    # sample initializer class
    @DIST_GROUP_INITIALIZER.register_module
    class MyParallelInitializer(ProcessGroupInitializer):

        def __init__(self,
                    rank: int,
                    world_size: int,
                    config: Config,
                    data_parallel_size: int,
                    pipeline_parallel_size: int,
                    tensor_parallel_size: int,
                    arg1,
                    arg2):
            super().__init__(rank, world_size, config)
            self.arg1 = arg1
            self.arg2 = arg2
            # ... your variable init

        def init_parallel_groups(self):
            # initialize your process groups
            pass

    ```

    Then, you can insert your new initializer to the current mode-to-initialize mapping
    in `colossalai.constants.INITIALIZER_MAPPING`. You can modify the file or insert new key-value pair dynamically.

    ```python
    colossalai.constants.INITIALIZER_MAPPING['new_mode'] = 'MyParallelInitializer'
    ```

3. Set your initializer in your config file. You can pass in your own arguments if there is any. This allows
   the `ParallelContext` to create your initializer and initialize your desired process groups.

    ```python
    parallel = dict(
        pipeline=dict(size=1),
        tensor=dict(size=x, mode='new_mode')  # this is where you enable your new parallel mode
    )
    ```

## Gradient Handler

Gradient handlers are objects which execute the all-reduce operations on parameters' gradients. As different all-reduce
strategies may be executed for different kinds of parallelism, users can
inherit `colossalai.legacy.engine.gradient_handler.BaseGradientHandler` to implement their strategies. Currently, the library
uses the normal data parallel gradient handler which all-reduces the gradients across data parallel ranks. The data
parallel gradient handler is added to the engine automatically if data parallel is detected. You can add your own
gradient handler like below:

```python
from colossalai.legacy.registry import GRADIENT_HANDLER
from colossalai.legacy.engine import BaseGradientHandler

@GRADIENT_HANDLER.register_module
class YourGradientHandler(BaseGradientHandler):

    def handle_gradient(self):
        do_something()

```

Afterwards, you can specify the gradient handler you want to use in your configuration file.

```python
gradient_handlers = [
    dict(type='YourGradientHandler'),
]
```

## Schedule

Schedule entails how to execute a forward and backward pass. Currently, Colossal-AI provides pipeline and non-pipeline
schedules. If you want to modify how the forward and backward passes are executed, you can
inherit `colossalai.legacy.engine.schedule.BaseSchedule` and implement the `forward_back_step` function.
<!-- doc-test-command: echo  -->
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			`# Add Your Own Parallel Mode`

			`Author: Shenggui Li, Yongbin Li`

			`Prerequisite:`
			`- [Define Your Configuration](../basics/define_your_config.md)`
			`- [Configure Parallelization](../basics/configure_parallelization.md)`

			`## Introduction`

			`To enable researchers and engineers to extend our system to other novel large-scale distributed training algorithm`
			`with less effort, we have decoupled various components in the training lifecycle. You can implement your own`
			`parallelism by simply inheriting from the base class.`

			`The main components are:`

			1. `ProcessGroupInitializer`
			2. `GradientHandler`
			3. `Schedule`

			**This currently requires some code to the source code, thus we recommend that you install from source with the `-e` flag.
			`-e` flag makes the installation editable, thus, your code change will be reflected in your Python runtime.
			`We will work on this to avoid change to source code in future releases.**`


			`## Process Group Initializer`

			`Parallelism is often managed by process groups where processes involved in the same parallel algorithm are placed in the same`
			`process group. For different parallel algorithms, different process groups need to be created. Colossal-AI provides a`
			`global context for users to easily manage their process groups. If you wish to add new process group, you can easily`
			`define a new class and set it in your configuration file. To define your own way of creating process groups, you can`
			`follow the steps below to create a new distributed initialization.`

[legacy] clean up legacy code (#4743) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci 2023-09-18 08:31:06 +00:00			1. Add your parallel mode in `colossalai.legacy.context.parallel_mode.ParallelMode`.
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			```python
			`class ParallelMode(Enum):`
			`GLOBAL = 'global'`
			`DATA = 'data'`
			`PIPELINE = 'pipe'`
			`...`

			`NEW_MODE = 'new_mode' # define your mode here`
			```

			2. Create a `ProcessGroupInitializer`. You can refer to examples given in `colossalai.context.dist_group_initializer`. The
			first six arguments are fixed. `ParallelContext` will pass in these arguments for you. If you need to set other
			arguments, you can add it behind like the `arg1, arg2` in the example below. Lastly, register your initializer to the
			registry by adding the decorator `@DIST_GROUP_INITIALIZER.register_module`.
			```python
			`# sample initializer class`
			`@DIST_GROUP_INITIALIZER.register_module`
			`class MyParallelInitializer(ProcessGroupInitializer):`

			`def __init__(self,`
			`rank: int,`
			`world_size: int,`
			`config: Config,`
			`data_parallel_size: int,`
fix typo docs/ 2023-05-24 01:53:21 +00:00			`pipeline_parallel_size: int,`
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			`tensor_parallel_size: int,`
			`arg1,`
			`arg2):`
			`super().__init__(rank, world_size, config)`
			`self.arg1 = arg1`
			`self.arg2 = arg2`
			`# ... your variable init`

			`def init_parallel_groups(self):`
			`# initialize your process groups`
			`pass`

			```

			`Then, you can insert your new initializer to the current mode-to-initialize mapping`
			in `colossalai.constants.INITIALIZER_MAPPING`. You can modify the file or insert new key-value pair dynamically.

			```python
			`colossalai.constants.INITIALIZER_MAPPING['new_mode'] = 'MyParallelInitializer'`
			```

			`3. Set your initializer in your config file. You can pass in your own arguments if there is any. This allows`
			the `ParallelContext` to create your initializer and initialize your desired process groups.

			```python
			`parallel = dict(`
			`pipeline=dict(size=1),`
			`tensor=dict(size=x, mode='new_mode') # this is where you enable your new parallel mode`
			`)`
			```

			`## Gradient Handler`

			`Gradient handlers are objects which execute the all-reduce operations on parameters' gradients. As different all-reduce`
			`strategies may be executed for different kinds of parallelism, users can`
[legacy] move engine to legacy (#4560) * [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements 2023-09-04 03:33:40 +00:00			inherit `colossalai.legacy.engine.gradient_handler.BaseGradientHandler` to implement their strategies. Currently, the library
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			`uses the normal data parallel gradient handler which all-reduces the gradients across data parallel ranks. The data`
			`parallel gradient handler is added to the engine automatically if data parallel is detected. You can add your own`
			`gradient handler like below:`

			```python
[legacy] move builder and registry to legacy (#4603) 2023-09-04 11:56:42 +00:00			`from colossalai.legacy.registry import GRADIENT_HANDLER`
[legacy] move engine to legacy (#4560) * [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements 2023-09-04 03:33:40 +00:00			`from colossalai.legacy.engine import BaseGradientHandler`
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00
			`@GRADIENT_HANDLER.register_module`
			`class YourGradientHandler(BaseGradientHandler):`

			`def handle_gradient(self):`
			`do_something()`

			```

			`Afterwards, you can specify the gradient handler you want to use in your configuration file.`

			```python
			`gradient_handlers = [`
			`dict(type='YourGradientHandler'),`
			`]`
			```

			`## Schedule`

			`Schedule entails how to execute a forward and backward pass. Currently, Colossal-AI provides pipeline and non-pipeline`
			`schedules. If you want to modify how the forward and backward passes are executed, you can`
[legacy] move engine to legacy (#4560) * [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements 2023-09-04 03:33:40 +00:00			inherit `colossalai.legacy.engine.schedule.BaseSchedule` and implement the `forward_back_step` function.
			`<!-- doc-test-command: echo -->`