ColossalAI/examples/tutorial/auto_parallel/README.md

# Auto-Parallelism

## Table of contents

- [Auto-Parallelism](#auto-parallelism)
  - [Table of contents](#table-of-contents)
  - [📚 Overview](#-overview)
  - [🚀 Quick Start](#-quick-start)
    - [Setup](#setup)
    - [Auto-Parallel Tutorial](#auto-parallel-tutorial)
    - [Auto-Checkpoint Tutorial](#auto-checkpoint-tutorial)


## 📚 Overview

This tutorial folder contains a simple demo to run auto-parallelism with ResNet. Meanwhile, this directory also contains demo scripts to run automatic activation checkpointing, but both features are still experimental for now and no guarantee that they will work for your version of Colossal-AI.

## 🚀 Quick Start

### Setup

1. Create a conda environment

```bash
conda create -n auto python=3.8
conda activate auto
```

2. Install `requirements` and `coin-or-cbc` for the solver.

```bash
pip install -r requirements.txt
conda install -c conda-forge coin-or-cbc
```


### Auto-Parallel Tutorial

Run the auto parallel resnet example with 4 GPUs with synthetic dataset.

```bash
colossalai run --nproc_per_node 4 auto_parallel_with_resnet.py
```

You should expect to the log like this. This log shows the edge cost on the computation graph as well as the sharding strategy for an operation. For example, `layer1_0_conv1 S01R = S01R X RR` means that the first dimension (batch) of the input and output is sharded while the weight is not sharded (S means sharded, R means replicated), simply equivalent to data parallel training.
![](https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/tutorial/auto-parallel%20demo.png)

**Note: This experimental feature has been tested on torch 1.12.1 and transformer 4.22.2. If you are using other versions, you may need to modify the code to make it work.**

### Auto-Checkpoint Tutorial

We prepare two bechmarks for you to test the performance of auto checkpoint

The first test `auto_ckpt_solver_test.py` will show you the ability of solver to search checkpoint strategy that could fit in the given budget (test on GPT2 Medium and ResNet 50). It will output the benchmark summary and data visualization of peak memory vs. budget memory and relative step time vs. peak memory.

The second test `auto_ckpt_batchsize_test.py` will show you the advantage of fitting larger batchsize training into limited GPU memory with the help of our activation checkpoint solver (test on ResNet152). It will output the benchmark summary.

The usage of the above two test
```bash
# run auto_ckpt_solver_test.py on gpt2 medium
python auto_ckpt_solver_test.py --model gpt2

# run auto_ckpt_solver_test.py on resnet50
python auto_ckpt_solver_test.py --model resnet50

# tun auto_ckpt_batchsize_test.py
python auto_ckpt_batchsize_test.py
```
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`# Auto-Parallelism`
[example] migrate diffusion and auto_parallel hands-on (#1871) 2022-11-10 07:31:46 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`## Table of contents`
[tutorial] polish all README (#1946) 2022-11-14 11:49:32 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`- [Auto-Parallelism](#auto-parallelism)`
			`- [Table of contents](#table-of-contents)`
			`- [📚 Overview](#-overview)`
			`- [🚀 Quick Start](#-quick-start)`
			`- [Setup](#setup)`
			`- [Auto-Parallel Tutorial](#auto-parallel-tutorial)`
			`- [Auto-Checkpoint Tutorial](#auto-checkpoint-tutorial)`
[tutorial] polish all README (#1946) 2022-11-14 11:49:32 +00:00

[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`## 📚 Overview`
[tutorial] polish all README (#1946) 2022-11-14 11:49:32 +00:00
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) Co-authored-by: Siyuan Tian <siyuant@vmware.com> 2023-08-14 16:22:57 +00:00			`This tutorial folder contains a simple demo to run auto-parallelism with ResNet. Meanwhile, this directory also contains demo scripts to run automatic activation checkpointing, but both features are still experimental for now and no guarantee that they will work for your version of Colossal-AI.`
[tutorial] polish all README (#1946) 2022-11-14 11:49:32 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`## 🚀 Quick Start`
[tutorial] polish all README (#1946) 2022-11-14 11:49:32 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`### Setup`
[tutorial] polish all README (#1946) 2022-11-14 11:49:32 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`1. Create a conda environment`
[example] migrate diffusion and auto_parallel hands-on (#1871) 2022-11-10 07:31:46 +00:00
			```bash
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`conda create -n auto python=3.8`
			`conda activate auto`
[example] migrate diffusion and auto_parallel hands-on (#1871) 2022-11-10 07:31:46 +00:00			```

[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			2. Install `requirements` and `coin-or-cbc` for the solver.
[sc demo] add requirements to spmd README (#1941) 2022-11-14 09:22:45 +00:00
			```bash
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			`pip install -r requirements.txt`
			`conda install -c conda-forge coin-or-cbc`
[sc demo] add requirements to spmd README (#1941) 2022-11-14 09:22:45 +00:00			```
[example] migrate diffusion and auto_parallel hands-on (#1871) 2022-11-10 07:31:46 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00
			`### Auto-Parallel Tutorial`

			`Run the auto parallel resnet example with 4 GPUs with synthetic dataset.`
[example] migrate diffusion and auto_parallel hands-on (#1871) 2022-11-10 07:31:46 +00:00
			```bash
[tutorial] updated auto parallel demo with latest data path (#1917) 2022-11-12 08:55:19 +00:00			`colossalai run --nproc_per_node 4 auto_parallel_with_resnet.py`
[tutorial] removed duplicated tutorials (#1904) 2022-11-11 09:23:40 +00:00			```
[SC] add GPT example for auto checkpoint (#1889) * [sc] SC tutorial for auto checkpoint * [sc] polish examples * [sc] polish readme * [sc] polish readme and help information * [sc] polish readme and help information 2022-11-11 15:17:25 +00:00
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00			You should expect to the log like this. This log shows the edge cost on the computation graph as well as the sharding strategy for an operation. For example, `layer1_0_conv1 S01R = S01R X RR` means that the first dimension (batch) of the input and output is sharded while the weight is not sharded (S means sharded, R means replicated), simply equivalent to data parallel training.
			`![](https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/tutorial/auto-parallel%20demo.png)`

[examples] polish AutoParallel readme (#3270) 2023-03-28 02:40:07 +00:00			`Note: This experimental feature has been tested on torch 1.12.1 and transformer 4.22.2. If you are using other versions, you may need to modify the code to make it work.`
[example] integrate autoparallel demo with CI (#2466) * [example] integrate autoparallel demo with CI * polish code * polish code * polish code * polish code 2023-01-12 08:26:42 +00:00
			`### Auto-Checkpoint Tutorial`
[SC] add GPT example for auto checkpoint (#1889) * [sc] SC tutorial for auto checkpoint * [sc] polish examples * [sc] polish readme * [sc] polish readme and help information * [sc] polish readme and help information 2022-11-11 15:17:25 +00:00
[tutorial] modify hands-on of auto activation checkpoint (#1920) * [sc] SC tutorial for auto checkpoint * [sc] polish examples * [sc] polish readme * [sc] polish readme and help information * [sc] polish readme and help information * [sc] modify auto checkpoint benchmark * [sc] remove imgs 2022-11-12 10:21:03 +00:00			`We prepare two bechmarks for you to test the performance of auto checkpoint`

			The first test `auto_ckpt_solver_test.py` will show you the ability of solver to search checkpoint strategy that could fit in the given budget (test on GPT2 Medium and ResNet 50). It will output the benchmark summary and data visualization of peak memory vs. budget memory and relative step time vs. peak memory.

			The second test `auto_ckpt_batchsize_test.py` will show you the advantage of fitting larger batchsize training into limited GPU memory with the help of our activation checkpoint solver (test on ResNet152). It will output the benchmark summary.
[SC] add GPT example for auto checkpoint (#1889) * [sc] SC tutorial for auto checkpoint * [sc] polish examples * [sc] polish readme * [sc] polish readme and help information * [sc] polish readme and help information 2022-11-11 15:17:25 +00:00
			`The usage of the above two test`
			```bash
[tutorial] modify hands-on of auto activation checkpoint (#1920) * [sc] SC tutorial for auto checkpoint * [sc] polish examples * [sc] polish readme * [sc] polish readme and help information * [sc] polish readme and help information * [sc] modify auto checkpoint benchmark * [sc] remove imgs 2022-11-12 10:21:03 +00:00			`# run auto_ckpt_solver_test.py on gpt2 medium`
			`python auto_ckpt_solver_test.py --model gpt2`

			`# run auto_ckpt_solver_test.py on resnet50`
			`python auto_ckpt_solver_test.py --model resnet50`

			`# tun auto_ckpt_batchsize_test.py`
			`python auto_ckpt_batchsize_test.py`
[SC] add GPT example for auto checkpoint (#1889) * [sc] SC tutorial for auto checkpoint * [sc] polish examples * [sc] polish readme * [sc] polish readme and help information * [sc] polish readme and help information 2022-11-11 15:17:25 +00:00			```