update performance evaluation

pull/3905/head
Maruyama_Aya 2023-06-06 14:08:22 +08:00
parent 25447d4407
commit 176010f289
1 changed files with 18 additions and 0 deletions

View File

@ -40,6 +40,9 @@ We have modified our previous implementation of Dreambooth with our new Booster
We have also offer a shell script `test_ci.sh` for you to go through all our plugins for the booster.
For more information about the booster API you can refer to https://colossalai.org/docs/basics/booster_api/.
## Training
We provide the script `colossalai.sh` to run the training task with colossalai. For instance, the script of training process for [stable-diffusion-v1-4] model can be modified into:
@ -97,7 +100,22 @@ torchrun --nproc_per_node 2 train_dreambooth_colossalai.py \
--placement="cuda"
```
## Performance
| Strategy | #GPU | Batch Size | GPU RAM(GB) | speedup |
|:--------------:|:----:|:----------:|:-----------:|:-------:|
| Traditional | 1 | 16 | oom | \ |
| Traditional | 1 | 8 | 61.81 | 1 |
| torch_ddp | 4 | 16 | oom | \ |
| torch_ddp | 4 | 8 | 41.97 | 0.97 |
| gemini | 4 | 16 | 53.29 | \ |
| gemini | 4 | 8 | 29.36 | 2.00 |
| low_level_zero | 4 | 16 | 52.80 | \ |
| low_level_zero | 4 | 8 | 28.87 | 2.02 |
The evaluation is performed on 4 Nvidia A100 GPUs with 80GB memory each, with GPU 0 & 1, 2 & 3 connected with NVLink.
We finetuned the [stable-diffusion-v1-4](https://huggingface.co/stabilityai/stable-diffusion-v1-4) model with 512x512 resolution on the [Teyvat](https://huggingface.co/datasets/Fazzie/Teyvat) dataset and compared
the memory cost and the throughput for the plugins.
## Invitation to open-source contribution
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!