2022-12-12 09:35:23 +00:00
# ColoDiffusion: Stable Diffusion with Colossal-AI
2023-01-04 05:13:38 +00:00
Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion v1 ](https://github.com/CompVis/stable-diffusion ) and [Stable Diffusion v2 ](https://github.com/Stability-AI/stablediffusion ).
2023-01-06 01:26:49 +00:00
2023-01-03 13:27:44 +00:00
< p id = "diffusion_train" align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20v2.png" width = 800/ >
< / p >
2022-11-07 09:43:36 +00:00
2023-01-03 13:27:44 +00:00
- [Training ](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion ): Reduce Stable Diffusion memory consumption by up to 5.6x and hardware cost by up to 46x (from A100 to RTX3060).
2022-11-07 09:43:36 +00:00
2023-01-03 13:27:44 +00:00
< p id = "diffusion_demo" align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/DreamBooth.png" width = 800/ >
< / p >
2022-12-12 09:35:23 +00:00
2023-01-06 01:26:49 +00:00
2023-01-04 11:38:06 +00:00
- [DreamBooth Fine-tuning ](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth ): Personalize your model using just 3-5 images of the desired subject.
2022-11-07 09:43:36 +00:00
2023-01-03 13:27:44 +00:00
< p id = "inference" align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20Inference.jpg" width = 800/ >
2022-11-08 14:36:55 +00:00
< / p >
2023-01-06 01:26:49 +00:00
2023-01-04 11:38:06 +00:00
- [Inference ](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion ): Reduce inference GPU memory consumption by 2.5x.
2022-11-08 14:36:55 +00:00
2023-01-06 01:26:49 +00:00
2023-01-03 13:27:44 +00:00
More details can be found in our [blog of Stable Diffusion v1 ](https://www.hpc-ai.tech/blog/diffusion-pretraining-and-hardware-fine-tuning-can-be-almost-7x-cheaper ) and [blog of Stable Diffusion v2 ](https://www.hpc-ai.tech/blog/colossal-ai-0-2-0 ).
2022-11-08 14:36:55 +00:00
2023-01-16 04:14:49 +00:00
## Roadmap
This project is in rapid development.
- [X] Train a stable diffusion model v1/v2 from scatch
- [X] Finetune a pretrained Stable diffusion v1 model
- [X] Inference a pretrained model using PyTorch
- [ ] Finetune a pretrained Stable diffusion v2 model
- [ ] Inference a pretrained model using TensoRT
2022-12-30 08:25:24 +00:00
## Installation
2022-12-30 10:00:20 +00:00
### Option #1: install from source
2022-12-30 08:25:24 +00:00
#### Step 1: Requirements
2022-12-12 09:35:23 +00:00
2022-11-07 09:43:36 +00:00
A suitable [conda ](https://conda.io/ ) environment named `ldm` can be created
and activated with:
```
conda env create -f environment.yaml
conda activate ldm
```
You can also update an existing [latent diffusion ](https://github.com/CompVis/latent-diffusion ) environment by running
```
2022-12-12 09:35:23 +00:00
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
2023-02-08 05:50:27 +00:00
pip install transformers diffusers invisible-watermark
2022-11-09 01:38:05 +00:00
```
2022-11-07 09:43:36 +00:00
2023-03-10 02:35:15 +00:00
#### Step 2:Install [Colossal-AI](https://colossalai.org/download/) From Our Official Website
2023-01-04 05:13:38 +00:00
2023-01-31 02:00:37 +00:00
##### From pip
2023-01-04 05:13:38 +00:00
2023-02-03 07:34:54 +00:00
For example, you can install v0.2.0 from our official website.
2022-12-12 09:35:23 +00:00
2022-11-07 09:43:36 +00:00
```
2023-02-17 01:54:21 +00:00
pip install colossalai
2022-11-07 09:43:36 +00:00
```
2023-01-31 02:00:37 +00:00
##### From source
```
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install colossalai
CUDA_EXT=1 pip install .
```
2023-02-15 01:55:53 +00:00
#### Step 3:Accelerate with flash attention by xformers(Optional)
```
pip install xformers
```
2022-12-30 10:00:20 +00:00
### Option #2: Use Docker
To use the stable diffusion Docker image, you can either build using the provided the [Dockerfile ](./docker/Dockerfile ) or pull a Docker image from our Docker hub.
2022-12-30 08:25:24 +00:00
```
2022-12-30 10:00:20 +00:00
# 1. build from dockerfile
2022-12-30 08:25:24 +00:00
cd docker
docker build -t hpcaitech/diffusion:0.2.0 .
2022-12-30 10:00:20 +00:00
# 2. pull from our docker hub
docker pull hpcaitech/diffusion:0.2.0
```
Once you have the image ready, you can launch the image with the following command:
```bash
########################
# On Your Host Machine #
########################
# make sure you start your image in the repository root directory
cd Colossal-AI
# run the docker container
docker run --rm \
-it --gpus all \
-v $PWD:/workspace \
-v < your-data-dir > :/data/scratch \
-v < hf-cache-dir > :/root/.cache/huggingface \
hpcaitech/diffusion:0.2.0 \
/bin/bash
########################
# Insider Container #
########################
# Once you have entered the docker container, go to the stable diffusion directory for training
cd examples/images/diffusion/
# start training with colossalai
bash train_colossalai.sh
2022-12-30 08:25:24 +00:00
```
2022-12-30 10:00:20 +00:00
It is important for you to configure your volume mapping in order to get the best training experience.
1. **Mandatory** , mount your prepared data to `/data/scratch` via `-v <your-data-dir>:/data/scratch` , where you need to replace `<your-data-dir>` with the actual data path on your machine.
2. **Recommended** , store the downloaded model weights to your host machine instead of the container directory via `-v <hf-cache-dir>:/root/.cache/huggingface` , where you need to repliace the `<hf-cache-dir>` with the actual path. In this way, you don't have to repeatedly download the pretrained weights for every `docker run` .
3. **Optional** , if you encounter any problem stating that shared memory is insufficient inside container, please add `-v /dev/shm:/dev/shm` to your `docker run` command.
2022-11-16 03:15:55 +00:00
## Download the model checkpoint from pretrained
2023-02-03 07:34:54 +00:00
### stable-diffusion-v2-base(Recommand)
2023-01-31 02:00:37 +00:00
```
wget https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt
```
2022-11-16 03:15:55 +00:00
### stable-diffusion-v1-4
2022-12-12 09:35:23 +00:00
2022-11-16 03:15:55 +00:00
```
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
```
### stable-diffusion-v1-5 from runway
2022-12-12 09:35:23 +00:00
2022-11-16 03:15:55 +00:00
```
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
```
2022-11-08 08:14:45 +00:00
## Dataset
2022-12-12 09:35:23 +00:00
2022-11-11 09:22:54 +00:00
The dataSet is from [LAION-5B ](https://laion.ai/blog/laion-5b/ ), the subset of [LAION ](https://laion.ai/ ),
2022-11-08 08:14:45 +00:00
you should the change the `data.file_path` in the `config/train_colossalai.yaml`
2022-11-07 09:43:36 +00:00
## Training
2022-12-26 07:22:20 +00:00
We provide the script `train_colossalai.sh` to run the training task with colossalai,
and can also use `train_ddp.sh` to run the training task with ddp to compare.
2022-11-07 09:43:36 +00:00
2022-12-26 07:22:20 +00:00
In `train_colossalai.sh` the main command is:
2023-02-03 07:34:54 +00:00
2022-11-07 09:43:36 +00:00
```
2023-02-03 07:34:54 +00:00
python main.py --logdir /tmp/ --train --base configs/train_colossalai.yaml --ckpt 512-base-ema.ckpt
2022-11-07 09:43:36 +00:00
```
2023-02-03 07:34:54 +00:00
- You can change the `--logdir` to decide where to save the log information and the last checkpoint.
- You will find your ckpt in `logdir/checkpoints` or `logdir/diff_tb/version_0/checkpoints`
- You will find your train config yaml in `logdir/configs`
- You can add the `--ckpt` if you want to load the pretrained model, for example `512-base-ema.ckpt`
- You can change the `--base` to specify the path of config yaml
2022-11-08 08:14:45 +00:00
### Training config
2022-12-12 09:35:23 +00:00
2022-11-11 09:22:54 +00:00
You can change the trainging config in the yaml file
2022-11-07 09:43:36 +00:00
2022-12-28 08:06:48 +00:00
- devices: device number used for training, default 8
- max_epochs: max training epochs, default 2
- precision: the precision type used in training, default 16 (fp16), you must use fp16 if you want to apply colossalai
- more information about the configuration of ColossalAIStrategy can be found [here ](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html#colossal-ai )
2022-11-07 09:43:36 +00:00
2023-02-03 07:34:54 +00:00
## Finetune Example
2022-12-12 09:35:23 +00:00
### Training on Teyvat Datasets
2022-11-11 09:22:54 +00:00
2022-12-12 09:35:23 +00:00
We provide the finetuning example on [Teyvat ](https://huggingface.co/datasets/Fazzie/Teyvat ) dataset, which is create by BLIP generated captions.
2022-11-11 09:22:54 +00:00
2022-12-12 09:35:23 +00:00
You can run by config `configs/Teyvat/train_colossalai_teyvat.yaml`
2022-11-11 09:22:54 +00:00
```
2022-12-12 09:35:23 +00:00
python main.py --logdir /tmp/ -t -b configs/Teyvat/train_colossalai_teyvat.yaml
2022-11-11 09:22:54 +00:00
```
2022-11-20 10:35:29 +00:00
## Inference
you can get yout training last.ckpt and train config.yaml in your `--logdir` , and run by
```
2022-12-12 09:35:23 +00:00
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
2022-11-20 10:35:29 +00:00
--outdir ./output \
2023-02-03 07:34:54 +00:00
--ckpt path/to/logdir/checkpoints/last.ckpt \
--config /path/to/logdir/configs/project.yaml \
2022-11-20 10:35:29 +00:00
```
```commandline
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
[--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
[--seed SEED] [--precision {full,autocast}]
optional arguments:
-h, --help show this help message and exit
--prompt [PROMPT] the prompt to render
--outdir [OUTDIR] dir to write results to
--skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples
--skip_save do not save individual samples. For speed measurements.
--ddim_steps DDIM_STEPS
number of ddim sampling steps
--plms use plms sampling
--laion400m uses the LAION400M model
--fixed_code if enabled, uses the same starting code across samples
--ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling
--n_iter N_ITER sample this often
--H H image height, in pixel space
--W W image width, in pixel space
--C C latent channels
--f F downsampling factor
--n_samples N_SAMPLES
how many samples to produce for each given prompt. A.k.a. batch size
--n_rows N_ROWS rows in the grid (default: n_samples)
--scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
--from-file FROM_FILE
if specified, load prompts from this file
--config CONFIG path to config which constructs model
--ckpt CKPT path to checkpoint of model
--seed SEED the seed (for reproducible sampling)
2022-12-26 07:22:20 +00:00
--use_int8 whether to use quantization method
2022-11-20 10:35:29 +00:00
--precision {full,autocast}
evaluate at this precision
```
2022-11-11 09:22:54 +00:00
2022-11-09 01:38:05 +00:00
## Comments
2022-11-07 09:43:36 +00:00
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase ](https://github.com/openai/guided-diffusion )
2022-11-09 04:04:49 +00:00
, [lucidrains ](https://github.com/lucidrains/denoising-diffusion-pytorch ),
[Stable Diffusion ](https://github.com/CompVis/stable-diffusion ), [Lightning ](https://github.com/Lightning-AI/lightning ) and [Hugging Face ](https://huggingface.co/CompVis/stable-diffusion ).
2022-11-07 09:43:36 +00:00
Thanks for open-sourcing!
2022-11-09 01:38:05 +00:00
- The implementation of the transformer encoder is from [x-transformers ](https://github.com/lucidrains/x-transformers ) by [lucidrains ](https://github.com/lucidrains?tab=repositories ).
2022-11-07 09:43:36 +00:00
2022-11-09 01:38:05 +00:00
- The implementation of [flash attention ](https://github.com/HazyResearch/flash-attention ) is from [HazyResearch ](https://github.com/HazyResearch ).
2022-11-07 09:43:36 +00:00
## BibTeX
```
2022-11-08 14:36:55 +00:00
@article {bian2021colossal,
title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
journal={arXiv preprint arXiv:2110.14883},
year={2021}
}
2022-11-07 09:43:36 +00:00
@misc {rombach2021highresolution,
2022-11-09 04:04:49 +00:00
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
2022-11-07 09:43:36 +00:00
}
@article {dao2022flashattention,
title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness},
author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
journal={arXiv preprint arXiv:2205.14135},
year={2022}
}
```