2023-02-09 06:21:38 +00:00
|
|
|
# Add Your Own Parallel Mode
|
|
|
|
|
|
|
|
Author: Shenggui Li, Yongbin Li
|
|
|
|
|
|
|
|
**Prerequisite:**
|
|
|
|
- [Define Your Configuration](../basics/define_your_config.md)
|
|
|
|
- [Configure Parallelization](../basics/configure_parallelization.md)
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
To enable researchers and engineers to extend our system to other novel large-scale distributed training algorithm
|
|
|
|
with less effort, we have decoupled various components in the training lifecycle. You can implement your own
|
|
|
|
parallelism by simply inheriting from the base class.
|
|
|
|
|
|
|
|
The main components are:
|
|
|
|
|
|
|
|
1. `ProcessGroupInitializer`
|
|
|
|
2. `GradientHandler`
|
|
|
|
3. `Schedule`
|
|
|
|
|
|
|
|
**This currently requires some code to the source code, thus we recommend that you install from source with the `-e` flag.
|
|
|
|
`-e` flag makes the installation editable, thus, your code change will be reflected in your Python runtime.
|
|
|
|
We will work on this to avoid change to source code in future releases.**
|
|
|
|
|
|
|
|
|
|
|
|
## Process Group Initializer
|
|
|
|
|
|
|
|
Parallelism is often managed by process groups where processes involved in the same parallel algorithm are placed in the same
|
|
|
|
process group. For different parallel algorithms, different process groups need to be created. Colossal-AI provides a
|
|
|
|
global context for users to easily manage their process groups. If you wish to add new process group, you can easily
|
|
|
|
define a new class and set it in your configuration file. To define your own way of creating process groups, you can
|
|
|
|
follow the steps below to create a new distributed initialization.
|
|
|
|
|
2023-09-18 08:31:06 +00:00
|
|
|
1. Add your parallel mode in `colossalai.legacy.context.parallel_mode.ParallelMode`.
|
2023-02-09 06:21:38 +00:00
|
|
|
```python
|
|
|
|
class ParallelMode(Enum):
|
|
|
|
GLOBAL = 'global'
|
|
|
|
DATA = 'data'
|
|
|
|
PIPELINE = 'pipe'
|
|
|
|
...
|
|
|
|
|
|
|
|
NEW_MODE = 'new_mode' # define your mode here
|
|
|
|
```
|
|
|
|
|
|
|
|
2. Create a `ProcessGroupInitializer`. You can refer to examples given in `colossalai.context.dist_group_initializer`. The
|
|
|
|
first six arguments are fixed. `ParallelContext` will pass in these arguments for you. If you need to set other
|
|
|
|
arguments, you can add it behind like the `arg1, arg2` in the example below. Lastly, register your initializer to the
|
|
|
|
registry by adding the decorator `@DIST_GROUP_INITIALIZER.register_module`.
|
|
|
|
```python
|
|
|
|
# sample initializer class
|
|
|
|
@DIST_GROUP_INITIALIZER.register_module
|
|
|
|
class MyParallelInitializer(ProcessGroupInitializer):
|
|
|
|
|
|
|
|
def __init__(self,
|
|
|
|
rank: int,
|
|
|
|
world_size: int,
|
|
|
|
config: Config,
|
|
|
|
data_parallel_size: int,
|
2023-05-24 01:53:21 +00:00
|
|
|
pipeline_parallel_size: int,
|
2023-02-09 06:21:38 +00:00
|
|
|
tensor_parallel_size: int,
|
|
|
|
arg1,
|
|
|
|
arg2):
|
|
|
|
super().__init__(rank, world_size, config)
|
|
|
|
self.arg1 = arg1
|
|
|
|
self.arg2 = arg2
|
|
|
|
# ... your variable init
|
|
|
|
|
|
|
|
def init_parallel_groups(self):
|
|
|
|
# initialize your process groups
|
|
|
|
pass
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Then, you can insert your new initializer to the current mode-to-initialize mapping
|
|
|
|
in `colossalai.constants.INITIALIZER_MAPPING`. You can modify the file or insert new key-value pair dynamically.
|
|
|
|
|
|
|
|
```python
|
|
|
|
colossalai.constants.INITIALIZER_MAPPING['new_mode'] = 'MyParallelInitializer'
|
|
|
|
```
|
|
|
|
|
|
|
|
3. Set your initializer in your config file. You can pass in your own arguments if there is any. This allows
|
|
|
|
the `ParallelContext` to create your initializer and initialize your desired process groups.
|
|
|
|
|
|
|
|
```python
|
|
|
|
parallel = dict(
|
|
|
|
pipeline=dict(size=1),
|
|
|
|
tensor=dict(size=x, mode='new_mode') # this is where you enable your new parallel mode
|
|
|
|
)
|
|
|
|
```
|
|
|
|
|
|
|
|
## Gradient Handler
|
|
|
|
|
|
|
|
Gradient handlers are objects which execute the all-reduce operations on parameters' gradients. As different all-reduce
|
|
|
|
strategies may be executed for different kinds of parallelism, users can
|
2023-09-04 03:33:40 +00:00
|
|
|
inherit `colossalai.legacy.engine.gradient_handler.BaseGradientHandler` to implement their strategies. Currently, the library
|
2023-02-09 06:21:38 +00:00
|
|
|
uses the normal data parallel gradient handler which all-reduces the gradients across data parallel ranks. The data
|
|
|
|
parallel gradient handler is added to the engine automatically if data parallel is detected. You can add your own
|
|
|
|
gradient handler like below:
|
|
|
|
|
|
|
|
```python
|
2023-09-04 11:56:42 +00:00
|
|
|
from colossalai.legacy.registry import GRADIENT_HANDLER
|
2023-09-04 03:33:40 +00:00
|
|
|
from colossalai.legacy.engine import BaseGradientHandler
|
2023-02-09 06:21:38 +00:00
|
|
|
|
|
|
|
@GRADIENT_HANDLER.register_module
|
|
|
|
class YourGradientHandler(BaseGradientHandler):
|
|
|
|
|
|
|
|
def handle_gradient(self):
|
|
|
|
do_something()
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Afterwards, you can specify the gradient handler you want to use in your configuration file.
|
|
|
|
|
|
|
|
```python
|
|
|
|
gradient_handlers = [
|
|
|
|
dict(type='YourGradientHandler'),
|
|
|
|
]
|
|
|
|
```
|
|
|
|
|
|
|
|
## Schedule
|
|
|
|
|
|
|
|
Schedule entails how to execute a forward and backward pass. Currently, Colossal-AI provides pipeline and non-pipeline
|
|
|
|
schedules. If you want to modify how the forward and backward passes are executed, you can
|
2023-09-04 03:33:40 +00:00
|
|
|
inherit `colossalai.legacy.engine.schedule.BaseSchedule` and implement the `forward_back_step` function.
|
|
|
|
<!-- doc-test-command: echo -->
|