You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ColossalAI/examples/images/diffusion/README.md

3.9 KiB

ColoDiffusion

ColoDiffusion is a Faster Train implementation of the model stable-diffusion from Stability AI

We take advantage of Colosssal-AI to exploit multiple optimization strategies , e.g. data parallelism, tensor parallelism, mixed precision & ZeRO, to scale the training to multiple GPUs.

Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See this section below and the model card.

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

You can also update an existing latent diffusion environment by running

conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

Install ColossalAI

git clone https://github.com/hpcaitech/ColossalAI.git
git checkout v0.1.10
pip install .

Install colossalai lightning

git clone -b colossalai https://github.com/Fazziekey/lightning.git
pip install .

Dataset

The DataSet is from LAION-5B, the subset of LAION, you should the change the data.file_path in the config/train_colossalai.yaml

Training

we provide the script train.sh to run the training task , and three Stategy in configs:train_colossalai.yaml, train_ddp.yaml, train_deepspeed.yaml

for example, you can run the training from colossalai by

python main.py --logdir /tmp -t --postfix test -b config/train_colossalai.yaml 
  • you can change the --logdir the save the log information and the last checkpoint

Training config

you can change the trainging config in the yaml file

  • accelerator: acceleratortype, default 'gpu'
  • devices: device number used for training, default 4
  • max_epochs: max training epochs
  • precision: usefp16 for training or not, default 16, you must use fp16 if you want to apply colossalai

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@article{dao2022flashattention,
  title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness},
  author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
  journal={arXiv preprint arXiv:2205.14135},
  year={2022}
}