ColossalAI/examples/tutorial/sequence_parallel/README.md

# Sequence Parallelism

## Table of contents

- [Sequence Parallelism](#sequence-parallelism)
  - [Table of contents](#table-of-contents)
  - [📚 Overview](#-overview)
  - [🚀 Quick Start](#-quick-start)
  - [🏎 How to Train with Sequence Parallelism](#-how-to-train-with-sequence-parallelism)
    - [Step 1. Configure your parameters](#step-1-configure-your-parameters)
    - [Step 2. Invoke parallel training](#step-2-invoke-parallel-training)

## 📚 Overview

In this tutorial, we implemented BERT with sequence parallelism. Sequence parallelism splits the input tensor and intermediate
activation along the sequence dimension. This method can achieve better memory efficiency and allows us to train with larger batch size and longer sequence length.

Paper: [Sequence Parallelism: Long Sequence Training from System Perspective](https://arxiv.org/abs/2105.13120)

## 🚀 Quick Start

1. Install PyTorch

2. Install the dependencies.

```bash
pip install -r requirements.txt
```

3. Run with the following command

```bash
export PYTHONPATH=$PWD

# run with synthetic dataset
colossalai run --nproc_per_node 4 train.py
```

> The default config is sequence parallel size = 2, pipeline size = 1, let’s change pipeline size to be 2 and try it again.


## 🏎 How to Train with Sequence Parallelism

We provided `train.py` for you to execute training. Before invoking the script, there are several
steps to perform.

### Step 1. Configure your parameters

In the `config.py` provided, a set of parameters are defined including training scheme, model, etc.
You can also modify the ColossalAI setting. For example, if you wish to parallelize over the
sequence dimension on 8 GPUs. You can change `size=4` to `size=8`. If you wish to use pipeline parallelism, you can set `pipeline=<num_of_pipeline_stages>`.

### Step 2. Invoke parallel training

Lastly, you can start training with sequence parallelism. How you invoke `train.py` depends on your
machine setting.

- If you are using a single machine with multiple GPUs, PyTorch launch utility can easily let you
  start your script. A sample command is like below:

  ```bash
    colossalai run --nproc_per_node <num_gpus_on_this_machine> --master_addr localhost --master_port 29500 train.py
  ```

- If you are using multiple machines with multiple GPUs, we suggest that you refer to `colossalai
  launch_from_slurm` or `colossalai.launch_from_openmpi` as it is easier to use SLURM and OpenMPI
  to start multiple processes over multiple nodes. If you have your own launcher, you can fall back
  to the default `colossalai.launch` function.
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								# Sequence Parallelism
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								## Table of contents
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								- [Sequence Parallelism](#sequence-parallelism)
 								  - [Table of contents](#table-of-contents)
 								  - [📚 Overview](#-overview)
 								  - [🚀 Quick Start](#-quick-start)
 								  - [🏎 How to Train with Sequence Parallelism](#-how-to-train-with-sequence-parallelism)
 								    - [Step 1. Configure your parameters](#step-1-configure-your-parameters)
 								    - [Step 2. Invoke parallel training](#step-2-invoke-parallel-training)
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								## 📚 Overview
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								In this tutorial, we implemented BERT with sequence parallelism. Sequence parallelism splits the input tensor and intermediate
 								activation along the sequence dimension. This method can achieve better memory efficiency and allows us to train with larger batch size and longer sequence length.
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								Paper: [Sequence Parallelism: Long Sequence Training from System Perspective](https://arxiv.org/abs/2105.13120)
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								## 🚀 Quick Start
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+. Install PyTorch
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+. Install the dependencies.
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								```bash
 								pip install -r requirements.txt
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
+								```
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+. Run with the following command
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								```bash
 								export PYTHONPATH=$PWD
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								# run with synthetic dataset
 								colossalai run --nproc_per_node 4 train.py
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
+								```
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								> The default config is sequence parallel size = 2, pipeline size = 1, let’s change pipeline size to be 2 and try it again.
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								## 🏎 How to Train with Sequence Parallelism
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								We provided `train.py` for you to execute training. Before invoking the script, there are several
 								steps to perform.
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								### Step 1. Configure your parameters
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
 								In the `config.py` provided, a set of parameters are defined including training scheme, model, etc.
-												Hotfix/tutorial readme index (#1922)

* [tutorial] removed tutorial index in readme

* [tutorial] removed tutorial index in readme
											
										
										
											2022-11-12 10:24:52 +00:00
+								You can also modify the ColossalAI setting. For example, if you wish to parallelize over the
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
+								sequence dimension on 8 GPUs. You can change `size=4` to `size=8`. If you wish to use pipeline parallelism, you can set `pipeline=<num_of_pipeline_stages>`.
-												[example] integrate seq-parallel tutorial with CI (#2463)


											
										
										
											2023-01-13 06:40:05 +00:00
+								### Step 2. Invoke parallel training
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
-												Hotfix/tutorial readme index (#1922)

* [tutorial] removed tutorial index in readme

* [tutorial] removed tutorial index in readme
											
										
										
											2022-11-12 10:24:52 +00:00
+								Lastly, you can start training with sequence parallelism. How you invoke `train.py` depends on your
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
+								machine setting.
 								- If you are using a single machine with multiple GPUs, PyTorch launch utility can easily let you
 								  start your script. A sample command is like below:
 								  ```bash
-												[tutorial] added synthetic data for sequence parallel (#1927)

* [tutorial] added synthetic data for sequence parallel

* polish code
											
										
										
											2022-11-12 19:24:02 +00:00
+								    colossalai run --nproc_per_node <num_gpus_on_this_machine> --master_addr localhost --master_port 29500 train.py
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
+								  ```
 								- If you are using multiple machines with multiple GPUs, we suggest that you refer to `colossalai
-												Hotfix/tutorial readme index (#1922)

* [tutorial] removed tutorial index in readme

* [tutorial] removed tutorial index in readme
											
										
										
											2022-11-12 10:24:52 +00:00
+								  launch_from_slurm` or `colossalai.launch_from_openmpi` as it is easier to use SLURM and OpenMPI
 								  to start multiple processes over multiple nodes. If you have your own launcher, you can fall back
-												[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
											
										
										
											2022-11-11 09:08:17 +00:00
+								  to the default `colossalai.launch` function.