# Stable Diffusion with Colossal-AI *[Colosssal-AI](https://github.com/hpcaitech/ColossalAI) provides a faster and lower cost solution for pretraining and fine-tuning for AIGC (AI-Generated Content) applications such as the model [stable-diffusion](https://github.com/CompVis/stable-diffusion) from [Stability AI](https://stability.ai/).* We take advantage of [Colosssal-AI](https://github.com/hpcaitech/ColossalAI) to exploit multiple optimization strategies , e.g. data parallelism, tensor parallelism, mixed precision & ZeRO, to scale the training to multiple GPUs. ## 🚀Quick Start 1. Create a new environment for diffusion ```bash conda env create -f environment.yaml conda activate ldm ``` 2. Install Colossal-AI from our official page ```bash pip install colossalai==0.1.10+torch1.11cu11.3 -f https://release.colossalai.org ``` 3. Install PyTorch Lightning compatible commit ```bash git clone https://github.com/Lightning-AI/lightning && cd lightning && git reset --hard b04a7aa pip install -r requirements.txt && pip install . cd .. ``` 4. Comment out the `from_pretrained` field in the `train_colossalai_cifar10.yaml`. 5. Run training with CIFAR10. ```bash python main.py -logdir /tmp -t true -postfix test -b configs/train_colossalai_cifar10.yaml ``` ## Stable Diffusion [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) is a latent text-to-image diffusion model. Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487), this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
[Stable Diffusion with Colossal-AI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) provides **6.5x faster training and pretraining cost saving, the hardware cost of fine-tuning can be almost 7X cheaper** (from RTX3090/4090 24GB to RTX3050/2070 8GB).
## Requirements A suitable [conda](https://conda.io/) environment named `ldm` can be created and activated with: ``` conda env create -f environment.yaml conda activate ldm ``` You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running ``` conda install pytorch torchvision -c pytorch pip install transformers==4.19.2 diffusers invisible-watermark pip install -e . ``` ### Install [Colossal-AI v0.1.10](https://colossalai.org/download/) From Our Official Website ``` pip install colossalai==0.1.10+torch1.11cu11.3 -f https://release.colossalai.org ``` ### Install [Lightning](https://github.com/Lightning-AI/lightning) We use the Sep. 2022 version with commit id as `b04a7aa`. ``` git clone https://github.com/Lightning-AI/lightning && cd lightning && git reset --hard b04a7aa pip install -r requirements.txt && pip install . ``` > The specified version is due to the interface incompatibility caused by the latest update of [Lightning](https://github.com/Lightning-AI/lightning), which will be fixed in the near future. ## Dataset The dataSet is from [LAION-5B](https://laion.ai/blog/laion-5b/), the subset of [LAION](https://laion.ai/), you should the change the `data.file_path` in the `config/train_colossalai.yaml` ## Training We provide the script `train.sh` to run the training task , and two Stategy in `configs`:`train_colossalai.yaml` For example, you can run the training from colossalai by ``` python main.py --logdir /tmp -t --postfix test -b configs/train_colossalai.yaml ``` - you can change the `--logdir` the save the log information and the last checkpoint ### Training config You can change the trainging config in the yaml file - accelerator: acceleratortype, default 'gpu' - devices: device number used for training, default 4 - max_epochs: max training epochs - precision: usefp16 for training or not, default 16, you must use fp16 if you want to apply colossalai ## Example ### Training on cifar10 We provide the finetuning example on CIFAR10 dataset You can run by config `train_colossalai_cifar10.yaml` ``` python main.py --logdir /tmp -t --postfix test -b configs/train_colossalai_cifar10.yaml ``` ## Comments - Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion) , [lucidrains](https://github.com/lucidrains/denoising-diffusion-pytorch), [Stable Diffusion](https://github.com/CompVis/stable-diffusion), [Lightning](https://github.com/Lightning-AI/lightning) and [Hugging Face](https://huggingface.co/CompVis/stable-diffusion). Thanks for open-sourcing! - The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). - The implementation of [flash attention](https://github.com/HazyResearch/flash-attention) is from [HazyResearch](https://github.com/HazyResearch). ## BibTeX ``` @article{bian2021colossal, title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training}, author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang}, journal={arXiv preprint arXiv:2110.14883}, year={2021} } @misc{rombach2021highresolution, title={High-Resolution Image Synthesis with Latent Diffusion Models}, author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer}, year={2021}, eprint={2112.10752}, archivePrefix={arXiv}, primaryClass={cs.CV} } @article{dao2022flashattention, title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness}, author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher}, journal={arXiv preprint arXiv:2205.14135}, year={2022} } ```