ColossalAI/examples/images/vit/README.md

## Overview

Vision Transformer is a class of Transformer model tailored for computer vision tasks. It was first proposed in paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) and achieved SOTA results on various tasks at that time.

In our example, we are using pretrained weights of ViT loaded from HuggingFace.
We adapt the ViT training code to ColossalAI by leveraging [Boosting API](https://colossalai.org/docs/basics/booster_api) loaded with a chosen plugin, where each plugin corresponds to a specific kind of training strategy. This example supports plugins including TorchDDPPlugin (DDP), LowLevelZeroPlugin (Zero1/Zero2), GeminiPlugin (Gemini) and HybridParallelPlugin (any combination of tensor/pipeline/data parallel).

## Run Demo

By running the following script:
```bash
bash run_demo.sh
```
You will finetune a a [ViT-base](https://huggingface.co/google/vit-base-patch16-224) model on this [dataset](https://huggingface.co/datasets/beans), with more than 8000 images of bean leaves. This dataset is for image classification task and there are 3 labels: ['angular_leaf_spot', 'bean_rust', 'healthy'].

The script can be modified if you want to try another set of hyperparameters or change to another ViT model with different size.

The demo code refers to this [blog](https://huggingface.co/blog/fine-tune-vit).


## Run Benchmark

You can run benchmark for ViT model by running the following script:
```bash
bash run_benchmark.sh
```
The script will test performance (throughput & peak memory usage) for each combination of hyperparameters. You can also play with this script to configure your own set of hyperparameters for testing.
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`## Overview`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`Vision Transformer is a class of Transformer model tailored for computer vision tasks. It was first proposed in paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) and achieved SOTA results on various tasks at that time.`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`In our example, we are using pretrained weights of ViT loaded from HuggingFace.`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00			`We adapt the ViT training code to ColossalAI by leveraging [Boosting API](https://colossalai.org/docs/basics/booster_api) loaded with a chosen plugin, where each plugin corresponds to a specific kind of training strategy. This example supports plugins including TorchDDPPlugin (DDP), LowLevelZeroPlugin (Zero1/Zero2), GeminiPlugin (Gemini) and HybridParallelPlugin (any combination of tensor/pipeline/data parallel).`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`## Run Demo`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`By running the following script:`
			```bash
			`bash run_demo.sh`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00			```
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`You will finetune a a [ViT-base](https://huggingface.co/google/vit-base-patch16-224) model on this [dataset](https://huggingface.co/datasets/beans), with more than 8000 images of bean leaves. This dataset is for image classification task and there are 3 labels: ['angular_leaf_spot', 'bean_rust', 'healthy'].`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`The script can be modified if you want to try another set of hyperparameters or change to another ViT model with different size.`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`The demo code refers to this [blog](https://huggingface.co/blog/fine-tune-vit).`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00


[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`## Run Benchmark`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`You can run benchmark for ViT model by running the following script:`
			```bash
			`bash run_benchmark.sh`
[example] add vit (#1942) * [ColoTensor] ColoInitContext initialize parameters in shard mode. * polish * [example] add vit 2022-11-14 09:28:03 +00:00			```
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00			`The script will test performance (throughput & peak memory usage) for each combination of hyperparameters. You can also play with this script to configure your own set of hyperparameters for testing.`