ColossalAI/examples/images/vit
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------

Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
..
README.md [example] update vit example for hybrid parallel plugin (#4641) 2023-09-07 17:38:45 +08:00
args.py [bug] fix get_default_parser in examples (#4764) 2023-09-21 10:42:25 +08:00
data.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
requirements.txt [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
run_benchmark.sh [bug] fix get_default_parser in examples (#4764) 2023-09-21 10:42:25 +08:00
run_demo.sh [bug] fix get_default_parser in examples (#4764) 2023-09-21 10:42:25 +08:00
test_ci.sh [bug] fix get_default_parser in examples (#4764) 2023-09-21 10:42:25 +08:00
vit_benchmark.py [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
vit_train_demo.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00

README.md

Overview

Vision Transformer is a class of Transformer model tailored for computer vision tasks. It was first proposed in paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale and achieved SOTA results on various tasks at that time.

In our example, we are using pretrained weights of ViT loaded from HuggingFace. We adapt the ViT training code to ColossalAI by leveraging Boosting API loaded with a chosen plugin, where each plugin corresponds to a specific kind of training strategy. This example supports plugins including TorchDDPPlugin (DDP), LowLevelZeroPlugin (Zero1/Zero2), GeminiPlugin (Gemini) and HybridParallelPlugin (any combination of tensor/pipeline/data parallel).

Run Demo

By running the following script:

bash run_demo.sh

You will finetune a a ViT-base model on this dataset, with more than 8000 images of bean leaves. This dataset is for image classification task and there are 3 labels: ['angular_leaf_spot', 'bean_rust', 'healthy'].

The script can be modified if you want to try another set of hyperparameters or change to another ViT model with different size.

The demo code refers to this blog.

Run Benchmark

You can run benchmark for ViT model by running the following script:

bash run_benchmark.sh

The script will test performance (throughput & peak memory usage) for each combination of hyperparameters. You can also play with this script to configure your own set of hyperparameters for testing.