ColossalAI/tests/test_zero_tensor_parallel/components.py


import sys
from pathlib import Path
repo_path = Path(__file__).absolute().parents[2]
sys.path.append(str(repo_path))

try:
    import model_zoo.vit.vision_transformer_from_config
except ImportError:
    raise ImportError("model_zoo is not found, please check your path")

BATCH_SIZE = 8
IMG_SIZE = 32
PATCH_SIZE = 4
DIM = 512
NUM_ATTENTION_HEADS = 8
SUMMA_DIM = 2
NUM_CLASSES = 10
DEPTH = 6

model_cfg = dict(
    type='VisionTransformerFromConfig',
    tensor_splitting_cfg=dict(
        type='ViTInputSplitter2D',
    ),
    embedding_cfg=dict(
        type='ViTPatchEmbedding2D',
        img_size=IMG_SIZE,
        patch_size=PATCH_SIZE,
        embed_dim=DIM,
    ),
    token_fusion_cfg=dict(
        type='ViTTokenFuser2D',
        img_size=IMG_SIZE,
        patch_size=PATCH_SIZE,
        embed_dim=DIM,
        drop_rate=0.1
    ),
    norm_cfg=dict(
        type='LayerNorm2D',
        normalized_shape=DIM,
        eps=1e-6,
    ),
    block_cfg=dict(
        type='ViTBlock',
        attention_cfg=dict(
            type='ViTSelfAttention2D',
            hidden_size=DIM,
            num_attention_heads=NUM_ATTENTION_HEADS,
            attention_dropout_prob=0.,
            hidden_dropout_prob=0.1,
        ),
        droppath_cfg=dict(
            type='VanillaViTDropPath',
        ),
        mlp_cfg=dict(
            type='ViTMLP2D',
            in_features=DIM,
            dropout_prob=0.1,
            mlp_ratio=1
        ),
        norm_cfg=dict(
            type='LayerNorm2D',
            normalized_shape=DIM,
            eps=1e-6,
        ),
    ),
    head_cfg=dict(
        type='ViTHead2D',
        hidden_size=DIM,
        num_classes=NUM_CLASSES,
    ),
    embed_dim=DIM,
    depth=DEPTH,
    drop_path_rate=0.,
)
Develop/experiments (#59) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> 2021-12-09 07:08:29 +00:00
			`import sys`
			`from pathlib import Path`
			`repo_path = Path(__file__).absolute().parents[2]`
			`sys.path.append(str(repo_path))`

			`try:`
			`import model_zoo.vit.vision_transformer_from_config`
			`except ImportError:`
			`raise ImportError("model_zoo is not found, please check your path")`

			`BATCH_SIZE = 8`
			`IMG_SIZE = 32`
			`PATCH_SIZE = 4`
			`DIM = 512`
			`NUM_ATTENTION_HEADS = 8`
			`SUMMA_DIM = 2`
			`NUM_CLASSES = 10`
			`DEPTH = 6`

			`model_cfg = dict(`
			`type='VisionTransformerFromConfig',`
			`tensor_splitting_cfg=dict(`
			`type='ViTInputSplitter2D',`
			`),`
			`embedding_cfg=dict(`
			`type='ViTPatchEmbedding2D',`
			`img_size=IMG_SIZE,`
			`patch_size=PATCH_SIZE,`
			`embed_dim=DIM,`
			`),`
			`token_fusion_cfg=dict(`
			`type='ViTTokenFuser2D',`
			`img_size=IMG_SIZE,`
			`patch_size=PATCH_SIZE,`
			`embed_dim=DIM,`
			`drop_rate=0.1`
			`),`
			`norm_cfg=dict(`
			`type='LayerNorm2D',`
			`normalized_shape=DIM,`
			`eps=1e-6,`
			`),`
			`block_cfg=dict(`
			`type='ViTBlock',`
			`attention_cfg=dict(`
			`type='ViTSelfAttention2D',`
			`hidden_size=DIM,`
			`num_attention_heads=NUM_ATTENTION_HEADS,`
			`attention_dropout_prob=0.,`
			`hidden_dropout_prob=0.1,`
			`),`
			`droppath_cfg=dict(`
			`type='VanillaViTDropPath',`
			`),`
			`mlp_cfg=dict(`
			`type='ViTMLP2D',`
			`in_features=DIM,`
			`dropout_prob=0.1,`
			`mlp_ratio=1`
			`),`
			`norm_cfg=dict(`
			`type='LayerNorm2D',`
			`normalized_shape=DIM,`
			`eps=1e-6,`
			`),`
			`),`
			`head_cfg=dict(`
			`type='ViTHead2D',`
			`hidden_size=DIM,`
			`num_classes=NUM_CLASSES,`
			`),`
			`embed_dim=DIM,`
			`depth=DEPTH,`
			`drop_path_rate=0.,`
			`)`