ColossalAI/examples/images/vit/data.py

import torch
from datasets import load_dataset
from torch.utils.data import Dataset


class BeansDataset(Dataset):
    def __init__(self, image_processor, tp_size=1, split="train"):
        super().__init__()
        self.image_processor = image_processor
        self.ds = load_dataset("beans")[split]
        self.label_names = self.ds.features["labels"].names
        while len(self.label_names) % tp_size != 0:
            # ensure that the number of labels is multiple of tp_size
            self.label_names.append(f"pad_label_{len(self.label_names)}")
        self.num_labels = len(self.label_names)
        self.inputs = []
        for example in self.ds:
            self.inputs.append(self.process_example(example))

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        return self.inputs[idx]

    def process_example(self, example):
        input = self.image_processor(example["image"], return_tensors="pt")
        input["labels"] = example["labels"]
        return input


def beans_collator(batch):
    return {
        "pixel_values": torch.cat([data["pixel_values"] for data in batch], dim=0),
        "labels": torch.tensor([data["labels"] for data in batch], dtype=torch.int64),
    }
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`import torch`
			`from datasets import load_dataset`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00			`from torch.utils.data import Dataset`

[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00
			`class BeansDataset(Dataset):`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`def __init__(self, image_processor, tp_size=1, split="train"):`
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`super().__init__()`
			`self.image_processor = image_processor`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`self.ds = load_dataset("beans")[split]`
			`self.label_names = self.ds.features["labels"].names`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00			`while len(self.label_names) % tp_size != 0:`
			`# ensure that the number of labels is multiple of tp_size`
			`self.label_names.append(f"pad_label_{len(self.label_names)}")`
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`self.num_labels = len(self.label_names)`
			`self.inputs = []`
			`for example in self.ds:`
			`self.inputs.append(self.process_example(example))`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`def __len__(self):`
			`return len(self.inputs)`

			`def __getitem__(self, idx):`
			`return self.inputs[idx]`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`def process_example(self, example):`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`input = self.image_processor(example["image"], return_tensors="pt")`
			`input["labels"] = example["labels"]`
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00			`return input`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00
[example] update ViT example using booster api (#3940) 2023-06-12 07:02:27 +00:00
			`def beans_collator(batch):`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00			`return {`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 2023-09-19 06:20:26 +00:00			`"pixel_values": torch.cat([data["pixel_values"] for data in batch], dim=0),`
			`"labels": torch.tensor([data["labels"] for data in batch], dtype=torch.int64),`
[example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar 2023-09-07 09:38:45 +00:00			`}`