Commit Graph

161 Commits (7eb87f516dfd242684b24c743f550ad9c0caf02c)

Author SHA1 Message Date
ver217 1949d3a889
update doc requirements and rtd conf (#165) 2022-01-19 19:46:43 +08:00
Frank Lee be85a0f366 removed tutorial markdown and refreshed rst files for consistency 2022-01-19 17:01:37 +08:00
Frank Lee ca4ae52d6b
Set examples as submodule (#162)
* remove examples folder

* added examples as submodule

* update .gitmodules
2022-01-19 16:35:36 +08:00
binmakeswell 17ce8569a8
add logo at homepage, add forum in issue template (#161) 2022-01-19 14:29:31 +08:00
puck_WCR 9473a1b9c8
AMP docstring/markdown update (#160) 2022-01-18 18:33:36 +08:00
Frank Lee 2499faa2db
update benchmark commit id (#159) 2022-01-18 17:14:00 +08:00
LuGY_mac d143396cac Added rand augment and update the dataloader 2022-01-18 16:14:46 +08:00
Frank Lee c7b8ece736
set benchmarks as a git submodule (#156)
* remove benchmark folder

* added benchmark submodule

* update .gitmodules
2022-01-18 15:48:07 +08:00
Frank Lee f3802d6b06
fixed jit default setting (#154) 2022-01-18 13:37:20 +08:00
Frank Lee a1da3900c8
added docker documentation (#152) 2022-01-18 13:35:18 +08:00
ver217 7bf1e98b97
pipeline last stage supports multi output (#151) 2022-01-17 15:57:47 +08:00
HELSON 1ff5be36c2
Added moe parallel example (#140) 2022-01-17 15:34:04 +08:00
ver217 f68eddfb3d
refactor kernel (#142) 2022-01-13 16:47:17 +08:00
BoxiangW 4a3d3446b0
Update layer integration documentations (#108)
Update the documentations of layer integration

Update _log_hook.py

Update _operation.py
2022-01-10 18:05:58 +08:00
binmakeswell 3a61d785b5
add doc issue template (#133) 2022-01-10 10:21:12 +08:00
ver217 9ef05ed1fc
try import deepspeed when using zero (#130) 2022-01-07 17:24:57 +08:00
ver217 b7975d2bcd
add workflow for deploying doc (#129) 2022-01-07 17:24:01 +08:00
HELSON dceae85195
Added MoE parallel (#127) 2022-01-07 15:08:36 +08:00
Frank Lee 42741dd4a3
added docker image (#126) 2022-01-07 14:54:04 +08:00
ver217 293fb40c42
add scatter/gather optim for pipeline (#123) 2022-01-07 13:22:22 +08:00
Frank Lee 404e6f88ed
Hotfix/gitact (#125)
* enable CI after PR sync

* Fixed github action
2022-01-07 00:08:47 +08:00
binmakeswell 43e7d54643
fix issue template (#118) 2022-01-06 16:13:34 +08:00
Jiarui Fang 2c0c85d3d3
fix a bug in timer (#114) 2022-01-05 16:07:06 +08:00
ver217 7904baf6e1
fix layers/schedule for hybrid parallelization (#111) (#112) 2022-01-04 20:52:31 +08:00
ver217 f03bcb359b
update vit example for new API (#98) (#99) 2022-01-04 20:35:33 +08:00
Frank Lee d09a79bad5
enable CI after PR sync (#97) 2022-01-04 20:31:14 +08:00
ver217 a951bc6089
update default logger (#100) (#101) 2022-01-04 20:03:26 +08:00
ver217 96780e6ee4
Optimize pipeline schedule (#94)
* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
2021-12-30 15:56:46 +08:00
アマデウス e5b9f9a08d
added gpt model & benchmark (#95) 2021-12-30 14:43:30 +08:00
アマデウス 01a80cd86d
Hotfix/Colossalai layers (#92)
* optimized 1d layer apis; reorganized nn.layer modules; fixed tests

* fixed 2.5d runtime issue

* reworked split batch, now called in trainer.schedule.load_batch

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-29 23:32:10 +08:00
アマデウス 0fedef4f3c
Layer integration (#83)
* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-27 15:04:32 +08:00
shenggan 5c3843dc98
add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
Xin Zhang 648f806315
add example of self-supervised SimCLR training - V2 (#50)
* add example of self-supervised SimCLR training

* simclr v2, replace nvidia dali dataloader

* updated

* sync to latest code writing style

* sync to latest code writing style and modify README

* detail README & standardize dataset path
2021-12-21 08:07:18 +08:00
ver217 8f02a88db2
add interleaved pipeline, fix naive amp and update pipeline model initializer (#80) 2021-12-20 23:26:19 +08:00
Frank Lee 91c327cb44
fixed zero level 3 dtype bug (#76) 2021-12-20 17:00:53 +08:00
HELSON 632e622de8
overlap computation and communication in 2d operations (#75) 2021-12-16 16:05:15 +08:00
Frank Lee cd9c28e055
added CI for unit testing (#69) 2021-12-16 10:32:08 +08:00
Frank Lee 45355a62f7
Update issue templates (#66) 2021-12-14 12:01:46 +08:00
Frank Lee 35813ed3c4
update examples and sphnix docs for the new api (#63) 2021-12-13 22:07:01 +08:00
ver217 7d3711058f
fix zero3 fp16 and add zero3 model context (#62) 2021-12-10 17:48:50 +08:00
Frank Lee 9a0466534c
update markdown docs (english) (#60) 2021-12-10 14:37:33 +08:00
Frank Lee da01c234e1
Develop/experiments (#59)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-09 15:08:29 +08:00
ver217 eb2f8b1f6b
add how to build tfrecord dataset (#48) 2021-12-02 16:31:23 +08:00
ver217 4da256a584
add some details in vit-b16 example (#46) 2021-12-02 09:29:27 +08:00
ver217 e67dab92a9
add some details in vit-b16 example (#43) (#44) 2021-12-02 08:55:11 +08:00
binmakeswell 2528adc62f
add explanation for ViT example (#35) (#36) 2021-11-29 10:25:38 +08:00
ver217 dbe62c67b8
add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) 2021-11-18 23:45:09 +08:00
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
ver217 2b05de4c64
use env to control the language of doc (#24) (#25) 2021-11-15 16:53:56 +08:00
ver217 9942fd5bfa
remove redundancy func in setup (#19) (#20) 2021-11-15 16:43:28 +08:00