Commit Graph

288 Commits (4021b9a8a2dd3a9155bba04c0ed2cd7362fa437f)

Author SHA1 Message Date
YuliangLiu0306 c7925c5d08
[sc demo] add requirements to spmd README (#1941) 2022-11-14 17:22:45 +08:00
Boyuan Yao d5f5e06d82
[SC] remove redundant hands on (#1939)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information

* [sc] modify auto checkpoint benchmark

* [sc] remove imgs

* [sc] remove redundant handson
2022-11-14 03:05:21 -06:00
binmakeswell 41868f7605
[tutorial] polish README and OPT files (#1930)
* [tutorial] polish README and OPT files

* [tutorial] polish README and OPT files

* [tutorial] polish README and OPT files
2022-11-13 13:09:58 +08:00
ver217 b0b7a786b7
[tutorial] add synthetic dataset for opt (#1924) 2022-11-13 03:26:11 +08:00
Frank Lee 0486048453
[tutorial] updated hybrid parallel readme (#1928)
* [tutorial] updated hybrid parallel readme

* polish code
2022-11-13 03:25:01 +08:00
Frank Lee 807cbdb87d
[tutorial] added synthetic data for sequence parallel (#1927)
* [tutorial] added synthetic data for sequence parallel

* polish code
2022-11-13 03:24:02 +08:00
Frank Lee abf4c27f6a
[tutorial] removed huggingface model warning (#1925) 2022-11-12 23:12:18 +08:00
Frank Lee d43a671ad6
Hotfix/tutorial readme index (#1922)
* [tutorial] removed tutorial index in readme

* [tutorial] removed tutorial index in readme
2022-11-12 18:24:52 +08:00
Boyuan Yao 24cbee0ebe
[tutorial] modify hands-on of auto activation checkpoint (#1920)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information

* [sc] modify auto checkpoint benchmark

* [sc] remove imgs
2022-11-12 18:21:03 +08:00
Frank Lee ff16773ded
[tutorial] added synthetic data for hybrid parallel (#1921)
* [tutorial] added synthetic data for hybrid parallel

* polish code
2022-11-12 18:18:55 +08:00
Frank Lee 3c42fdbedc
[tutorial] added synthetic data for hybrid parallel (#1919) 2022-11-12 17:49:48 +08:00
Frank Lee 1b0dd05940
[tutorial] added synthetic dataset for auto parallel demo (#1918) 2022-11-12 17:14:32 +08:00
Frank Lee acd9abc5ca
[tutorial] updated auto parallel demo with latest data path (#1917) 2022-11-12 16:55:19 +08:00
Frank Lee d53415bc10
[tutorial] added data script and updated readme (#1916) 2022-11-12 16:38:41 +08:00
binmakeswell 155e202318
[example] update auto_parallel img path (#1910) 2022-11-11 23:43:22 +08:00
Boyuan Yao d5c5bc219e
[SC] add GPT example for auto checkpoint (#1889)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information
2022-11-11 23:17:25 +08:00
binmakeswell 11ee8ae478
[tutorial] add cifar10 for diffusion (#1907) 2022-11-11 19:03:50 +08:00
Frank Lee cb7ec714c8
[tutorial] removed duplicated tutorials (#1904) 2022-11-11 17:23:40 +08:00
Fazzie-Maqianli 351f0f64e6
[example] add cifar10 dadaset for diffusion (#1902)
* add cifar10 dadasets

* Update README.md

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2022-11-11 17:22:54 +08:00
BoxiangW ca6e75bc28
[tutorial] edited hands-on practices (#1899)
* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
2022-11-11 17:08:17 +08:00
BoxiangW d9bf83e084
Add handson to ColossalAI. (#1896)
Co-authored-by: Boxiang Wang <boxiang.wang1@gmail.com>
2022-11-11 16:13:22 +08:00
Super Daniel 6d559ea614
[sc] add examples for auto checkpoint. (#1880) 2022-11-10 20:50:15 +08:00
HELSON f9e7d179f2
[diffusion] fix package conflicts (#1875) 2022-11-10 16:33:34 +08:00
binmakeswell 610dda676c
[example] migrate diffusion and auto_parallel hands-on (#1871) 2022-11-10 15:31:46 +08:00
binmakeswell 50c4cb0167
[NFC] remove redundant dependency (#1869)
* remove redundant config

* remove redundant dependency
2022-11-10 14:51:47 +08:00
binmakeswell fd8f0ca5a8
[example] initialize tutorial (#1865) 2022-11-10 14:05:27 +08:00
binmakeswell e9635eb493 add explanation specified version 2022-11-09 12:13:01 +08:00
jiaruifang 27211d6267 [example] polish diffusion readme 2022-11-09 09:38:05 +08:00
binmakeswell 4ac7d3ec3b
[doc] polish diffusion README (#1840) 2022-11-08 22:36:55 +08:00
Jiarui Fang f86a703bcf
[NFC] update gitignore remove DS_Store (#1830) 2022-11-08 17:18:15 +08:00
Jiarui Fang a25f755331
[example] add TP to GPT example (#1828) 2022-11-08 17:17:19 +08:00
Fazzie-Maqianli 6e9730d7ab
[example] add stable diffuser (#1825) 2022-11-08 16:14:45 +08:00
Jiarui Fang b1263d32ba
[example] simplify the GPT2 huggingface example (#1826) 2022-11-08 16:14:07 +08:00
Jiarui Fang cd5a0d56fa
[Gemini] make gemini usage simple (#1821) 2022-11-08 15:53:13 +08:00
Maruyama_Aya a7e8159da6 add ColoDiffusion codes: /ldm/module/, /ldm/data/, /scripts/test/ 2022-11-08 14:39:35 +08:00
Jiarui Fang 350ccc0481
[example] opt does not depend on Titans (#1811) 2022-11-08 12:02:20 +08:00
Jiarui Fang 203ca57aed
[example] add GPT 2022-11-08 10:58:17 +08:00
Jiarui Fang fd2c8d8156
[example] add opt model in lauguage (#1809) 2022-11-08 10:39:13 +08:00
Jiarui Fang f5a92c288c
[example] add diffusion to example (#1805) 2022-11-07 17:43:36 +08:00
Jiarui Fang a19eb80998
[embedding] updates some default parameters 2022-09-15 15:45:17 +08:00
github-actions[bot] 177d3f5718
Automated submodule synchronization (#1465)
Co-authored-by: github-actions <github-actions@github.com>
2022-08-19 13:39:21 +08:00
github-actions[bot] 9b442ecdc3
Automated submodule synchronization (#1404)
Co-authored-by: github-actions <github-actions@github.com>
2022-08-08 11:24:58 +08:00
github-actions[bot] 1e5eb0874c
Automated submodule synchronization (#1396)
Co-authored-by: github-actions <github-actions@github.com>
2022-08-03 09:18:45 +08:00
github-actions[bot] 50dec605e1
Automated submodule synchronization (#1380)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-28 11:12:52 +08:00
github-actions[bot] fb6f085907
Automated submodule synchronization (#1372)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-27 09:25:03 +08:00
github-actions[bot] 6160a1d6a7
Automated submodule synchronization (#1348)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-21 10:50:27 +08:00
github-actions[bot] 6f2f9eb214
Automated submodule synchronization (#1305)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-14 13:40:54 +08:00
github-actions[bot] 762905da68
Automated submodule synchronization (#1241)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-12 10:32:20 +08:00
github-actions[bot] 4951f7d80c
Automated submodule synchronization (#1204)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-07 15:22:45 +08:00
github-actions[bot] 23442a5bc1
Automated submodule synchronization (#1194)
Co-authored-by: github-actions <github-actions@github.com>
2022-07-04 10:12:17 +08:00
github-actions[bot] 6f0733a1ef
Automated submodule synchronization (#1159)
Co-authored-by: github-actions <github-actions@github.com>
2022-06-29 15:11:36 +08:00
github-actions[bot] e8c34eedfd
Automated submodule synchronization (#1129)
Co-authored-by: github-actions <github-actions@github.com>
2022-06-22 14:39:08 +08:00
github-actions[bot] 85b58093d2
Automated submodule synchronization (#1105)
Co-authored-by: github-actions <github-actions@github.com>
2022-06-14 09:53:30 +08:00
github-actions[bot] e32470b6de
Automated submodule synchronization (#1049)
Co-authored-by: github-actions <github-actions@github.com>
2022-06-01 11:04:32 +08:00
github-actions[bot] 4d8a574cd3
Automated submodule synchronization (#1034)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-27 17:12:48 +08:00
github-actions[bot] 9e3d602dba
Automated submodule synchronization (#1003)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-20 17:08:44 +08:00
github-actions[bot] 46bc95708f
Automated submodule synchronization (#960)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-14 21:55:34 +08:00
github-actions[bot] 7edb38193a
Automated submodule synchronization (#932)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-13 10:22:51 +08:00
github-actions[bot] b61d64685f
Automated submodule synchronization (#929)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-11 09:13:06 +08:00
github-actions[bot] 1cf7fb3cd9
Automated submodule synchronization (#912)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-06 10:10:56 +08:00
github-actions[bot] 3b1f5f07ce
Automated submodule synchronization (#907)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-03 13:14:48 +08:00
github-actions[bot] f271f34716
Automated submodule synchronization (#827)
Co-authored-by: github-actions <github-actions@github.com>
2022-04-22 15:24:58 +08:00
github-actions[bot] 413ce30c45
Automated submodule synchronization (#819)
Co-authored-by: github-actions <github-actions@github.com>
2022-04-21 11:26:58 +08:00
github-actions[bot] 9aae4197bb
Automated submodule synchronization (#810)
Co-authored-by: github-actions <github-actions@github.com>
2022-04-20 13:57:12 +08:00
github-actions[bot] 6978980f6d
Automated submodule synchronization (#751)
Co-authored-by: github-actions <github-actions@github.com>
2022-04-14 15:34:01 +08:00
github-actions[bot] d878d843ad
Automated submodule synchronization (#695)
Co-authored-by: github-actions <github-actions@github.com>
2022-04-08 10:03:53 +08:00
github-actions[bot] d50cdabbc9
Automated submodule synchronization (#556)
Co-authored-by: github-actions <github-actions@github.com>
2022-04-07 22:11:00 +08:00
github-actions[bot] 92f4224867
Automated submodule synchronization (#501) 2022-03-30 14:06:23 +08:00
github-actions[bot] 353566c198
Automated submodule synchronization (#483)
Co-authored-by: github-actions <github-actions@github.com>
2022-03-22 09:34:26 +08:00
github-actions[bot] cfcc8271f3
[Bot] Automated submodule synchronization (#451)
Co-authored-by: github-actions <github-actions@github.com>
2022-03-18 09:51:43 +08:00
github-actions 6098bc4cce Automated submodule synchronization 2022-03-14 00:01:12 +00:00
github-actions b9f8521f8c Automated submodule synchronization 2022-02-15 11:35:37 +08:00
github-actions[bot] 5420809f43
Automated submodule synchronization (#203)
Co-authored-by: github-actions <github-actions@github.com>
2022-02-04 10:19:38 +08:00
Frank Lee ca4ae52d6b
Set examples as submodule (#162)
* remove examples folder

* added examples as submodule

* update .gitmodules
2022-01-19 16:35:36 +08:00
LuGY_mac d143396cac Added rand augment and update the dataloader 2022-01-18 16:14:46 +08:00
HELSON 1ff5be36c2
Added moe parallel example (#140) 2022-01-17 15:34:04 +08:00
ver217 f03bcb359b
update vit example for new API (#98) (#99) 2022-01-04 20:35:33 +08:00
アマデウス 0fedef4f3c
Layer integration (#83)
* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-27 15:04:32 +08:00
Xin Zhang 648f806315
add example of self-supervised SimCLR training - V2 (#50)
* add example of self-supervised SimCLR training

* simclr v2, replace nvidia dali dataloader

* updated

* sync to latest code writing style

* sync to latest code writing style and modify README

* detail README & standardize dataset path
2021-12-21 08:07:18 +08:00
Frank Lee 35813ed3c4
update examples and sphnix docs for the new api (#63) 2021-12-13 22:07:01 +08:00
Frank Lee da01c234e1
Develop/experiments (#59)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-09 15:08:29 +08:00
ver217 eb2f8b1f6b
add how to build tfrecord dataset (#48) 2021-12-02 16:31:23 +08:00
ver217 4da256a584
add some details in vit-b16 example (#46) 2021-12-02 09:29:27 +08:00
ver217 e67dab92a9
add some details in vit-b16 example (#43) (#44) 2021-12-02 08:55:11 +08:00
binmakeswell 2528adc62f
add explanation for ViT example (#35) (#36) 2021-11-29 10:25:38 +08:00
ver217 dbe62c67b8
add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) 2021-11-18 23:45:09 +08:00
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
zbian 404ecbdcc6 Migrated project 2021-10-28 18:21:23 +02:00