mirror of https://github.com/hpcaitech/ColossalAI
[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170)
* Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yamlpull/3178/head
parent
a9b8402d93
commit
4e921cfbd6
|
@ -40,8 +40,7 @@ This project is in rapid development.
|
||||||
### Option #1: install from source
|
### Option #1: install from source
|
||||||
#### Step 1: Requirements
|
#### Step 1: Requirements
|
||||||
|
|
||||||
A suitable [conda](https://conda.io/) environment named `ldm` can be created
|
To begin with, make sure your operating system has the cuda version suitable for this exciting training session, which is cuda11.6/11.8. For your convience, we have set up the rest of packages here. You can create and activate a suitable [conda](https://conda.io/) environment named `ldm` :
|
||||||
and activated with:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
conda env create -f environment.yaml
|
conda env create -f environment.yaml
|
||||||
|
@ -55,11 +54,34 @@ conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit
|
||||||
pip install transformers diffusers invisible-watermark
|
pip install transformers diffusers invisible-watermark
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Step 2:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website
|
#### Step 2: install lightning
|
||||||
|
|
||||||
|
Install Lightning version later than 2022.01.04. We suggest you install lightning from source. Notice that the default download path of pip should be within the conda environment, or you may need to specify using 'which pip' and redirect the path into conda environment.
|
||||||
|
|
||||||
|
##### From Source
|
||||||
|
```
|
||||||
|
git clone https://github.com/Lightning-AI/lightning.git
|
||||||
|
pip install -r requirements.txt
|
||||||
|
python setup.py install
|
||||||
|
```
|
||||||
|
|
||||||
##### From pip
|
##### From pip
|
||||||
|
|
||||||
For example, you can install v0.2.0 from our official website.
|
```
|
||||||
|
pip install pytorch-lightning
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 3:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website
|
||||||
|
|
||||||
|
You can install the latest version (0.2.7) from our official website or from source. Notice that the suitable version for this training is colossalai(0.2.5), which stands for torch(1.12.1).
|
||||||
|
|
||||||
|
##### Download suggested verision for this training
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install colossalai=0.2.5
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Download the latest version from pip for latest torch version
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install colossalai
|
pip install colossalai
|
||||||
|
@ -75,10 +97,12 @@ cd ColossalAI
|
||||||
CUDA_EXT=1 pip install .
|
CUDA_EXT=1 pip install .
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Step 3:Accelerate with flash attention by xformers(Optional)
|
#### Step 4:Accelerate with flash attention by xformers(Optional)
|
||||||
|
|
||||||
|
Notice that xformers will accelerate the training process in cost of extra disk space. The suitable version of xformers for this training process is 0.12.0. You can download xformers directly via pip. For more release versions, feel free to check its official website: [XFormers](./https://pypi.org/project/xformers/)
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install xformers
|
pip install xformers==0.0.12
|
||||||
```
|
```
|
||||||
|
|
||||||
### Option #2: Use Docker
|
### Option #2: Use Docker
|
||||||
|
@ -94,7 +118,7 @@ docker build -t hpcaitech/diffusion:0.2.0 .
|
||||||
docker pull hpcaitech/diffusion:0.2.0
|
docker pull hpcaitech/diffusion:0.2.0
|
||||||
```
|
```
|
||||||
|
|
||||||
Once you have the image ready, you can launch the image with the following command:
|
Once you have the image ready, you can launch the image with the following command
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
########################
|
########################
|
||||||
|
@ -157,10 +181,9 @@ you should the change the `data.file_path` in the `config/train_colossalai.yaml`
|
||||||
|
|
||||||
## Training
|
## Training
|
||||||
|
|
||||||
We provide the script `train_colossalai.sh` to run the training task with colossalai,
|
We provide the script `train_colossalai.sh` to run the training task with colossalai. Meanwhile, we have enlightened other training process such as DDP model in PyTorch. You can also use `train_ddp.sh` to run the training task with ddp to compare the corresponding performance.
|
||||||
and can also use `train_ddp.sh` to run the training task with ddp to compare.
|
|
||||||
|
|
||||||
In `train_colossalai.sh` the main command is:
|
In `train_colossalai.sh` the main command is
|
||||||
|
|
||||||
```
|
```
|
||||||
python main.py --logdir /tmp/ --train --base configs/train_colossalai.yaml --ckpt 512-base-ema.ckpt
|
python main.py --logdir /tmp/ --train --base configs/train_colossalai.yaml --ckpt 512-base-ema.ckpt
|
||||||
|
@ -176,9 +199,10 @@ python main.py --logdir /tmp/ --train --base configs/train_colossalai.yaml --ckp
|
||||||
|
|
||||||
You can change the trainging config in the yaml file
|
You can change the trainging config in the yaml file
|
||||||
|
|
||||||
- devices: device number used for training, default 8
|
- devices: device number used for training, default = 8
|
||||||
- max_epochs: max training epochs, default 2
|
- max_epochs: max training epochs, default = 2
|
||||||
- precision: the precision type used in training, default 16 (fp16), you must use fp16 if you want to apply colossalai
|
- precision: the precision type used in training, default = 16 (fp16), you must use fp16 if you want to apply colossalai
|
||||||
|
- placement_policy: the training strategy supported by Colossal AI, defult = 'cuda', which refers to loading all the parameters into cuda memory. On the other hand, 'cpu' refers to 'cpu offload' strategy while 'auto' enables 'Gemini', both featured by Colossal AI.
|
||||||
- more information about the configuration of ColossalAIStrategy can be found [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html#colossal-ai)
|
- more information about the configuration of ColossalAIStrategy can be found [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html#colossal-ai)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,10 +1,10 @@
|
||||||
albumentations==1.3.0
|
albumentations==1.3.0
|
||||||
opencv-python==4.6.0
|
opencv-python==4.6.0.66
|
||||||
pudb==2019.2
|
pudb==2019.2
|
||||||
prefetch_generator
|
prefetch_generator
|
||||||
imageio==2.9.0
|
imageio==2.9.0
|
||||||
imageio-ffmpeg==0.4.2
|
imageio-ffmpeg==0.4.2
|
||||||
torchmetrics==0.6
|
torchmetrics==0.7
|
||||||
omegaconf==2.1.1
|
omegaconf==2.1.1
|
||||||
test-tube>=0.7.5
|
test-tube>=0.7.5
|
||||||
streamlit>=0.73.1
|
streamlit>=0.73.1
|
||||||
|
|
Loading…
Reference in New Issue