Haofan Wang
9edd0aa75e
Update train_dreambooth_colossalai.py
...
accelerator.num_processes -> gpc.get_world_size(ParallelMode.DATA)
2023-01-05 15:49:57 +08:00
Fazzie-Maqianli
89f26331e9
[example] diffusion update diffusion,Dreamblooth ( #2329 )
2023-01-05 11:23:26 +08:00
binmakeswell
e512ca9c24
[doc] update stable diffusion link ( #2322 )
...
* [doc] update link
2023-01-04 19:38:06 +08:00
Fazzie-Maqianli
a9b27b9265
[exmaple] fix dreamblooth format ( #2315 )
2023-01-04 16:20:00 +08:00
Jiarui Fang
32253315b4
[example] update diffusion readme with official lightning ( #2304 )
2023-01-04 13:13:38 +08:00
HELSON
e00cedd181
[example] update gemini benchmark bash ( #2306 )
2023-01-04 11:59:26 +08:00
binmakeswell
c8144223b8
[doc] update diffusion doc ( #2296 )
2023-01-03 21:27:44 +08:00
ZijianYY
df1d6dc553
[examples] using args and combining two versions for PaLM ( #2284 )
2023-01-03 17:49:00 +08:00
Ziyue Jiang
ac863a01d6
[example] add benchmark ( #2276 )
...
* add benchmark
* merge common func
* add total and avg tflops
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 17:20:59 +08:00
BlueRum
1405b4381e
[example] fix save_load bug for dreambooth ( #2280 )
2023-01-03 17:13:29 +08:00
Jiarui Fang
879df8b943
[example] GPT polish readme ( #2274 )
2023-01-03 15:46:52 +08:00
Ziyue Jiang
9654df0e9a
Add GPT PP Example ( #2272 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 15:17:26 +08:00
YuliangLiu0306
4b29112ab2
[autoparallel] gpt2 autoparallel examples ( #2267 )
...
* [autoparallel] gpt2 autoparallel examples
* polish code
* polish code
2023-01-03 14:23:33 +08:00
HELSON
09c0102fe6
[example] fix gpt example with 0.1.10 ( #2265 )
2023-01-03 13:38:14 +08:00
Fazzie-Maqianli
89f048a88a
[example] clear diffuser image ( #2262 )
2023-01-03 10:57:02 +08:00
Frank Lee
89542ceb44
[doc] updated the stable diffussion on docker usage ( #2244 )
...
* [doc] updated the stable diffussion on docker usage
* polish doc
2022-12-30 18:00:20 +08:00
Jiarui Fang
50cdf5430e
[example] diffusion install from docker ( #2239 )
...
* [builder] builder for scaled_upper_triang_masked_softmax
* add missing files
* fix a bug
* polish code
* [example] diffusion install from docker
2022-12-30 16:25:24 +08:00
Jiarui Fang
db4cbdc7fb
[builder] builder for scaled_upper_triang_masked_softmax ( #2234 )
2022-12-30 09:58:00 +08:00
HELSON
31fe84237b
[example] fix benchmark.sh for gpt example ( #2229 )
2022-12-29 23:00:14 +08:00
Jiarui Fang
2cdecc9f38
[example] make palm + GeminiDPP work ( #2227 )
2022-12-29 14:28:31 +08:00
ZijianYY
63cc77173b
[example] Palm adding gemini, still has bugs ( #2221 )
2022-12-29 14:01:09 +08:00
HELSON
7010e18134
[example] update gpt example ( #2225 )
2022-12-29 12:01:45 +08:00
Jiarui Fang
49c601da21
[example] add benchmark.sh for gpt ( #2226 )
2022-12-29 12:00:00 +08:00
HELSON
3629e611cd
[example] update gpt benchmark ( #2219 )
2022-12-29 10:51:42 +08:00
ZijianYY
92de90dfb3
[examples] replace einsum with matmul ( #2210 )
2022-12-28 19:03:06 +08:00
Jiarui Fang
7675792100
[builder] raise Error when CUDA_HOME is not set ( #2213 )
2022-12-28 16:07:08 +08:00
HELSON
78a89d9b41
[diffusion] update readme ( #2214 )
2022-12-28 16:06:48 +08:00
Jiarui Fang
d96cc37e32
[example] update GPT example benchmark results ( #2212 )
2022-12-28 14:28:12 +08:00
Jiarui Fang
d5e3e3ec01
[example] update gpt example for larger model scale ( #2211 )
2022-12-28 13:54:08 +08:00
Jiarui Fang
29868a9ec1
[example] update gpt readme with performance ( #2206 )
2022-12-27 17:39:53 +08:00
BlueRum
6642cebdbe
[example] Change some training settings for diffusion ( #2195 )
2022-12-26 15:22:20 +08:00
ziyuhuang123
4363ff3e41
'[NFC] fix some typos' ( #2175 )
2022-12-25 18:41:39 +08:00
Fazzie-Maqianli
ce3c4eca7b
[example] support Dreamblooth ( #2188 )
2022-12-23 16:47:30 +08:00
BlueRum
1cf6d92d7c
[exmaple] diffuser, support quant inference for stable diffusion ( #2186 )
2022-12-23 16:06:29 +08:00
Jiarui Fang
65f56f49e8
[example] gpt demo more accuracy tflops ( #2178 )
2022-12-22 20:51:35 +08:00
ziyuhuang123
cf5028363c
'diffusion-typo-change'
2022-12-22 10:28:59 +08:00
Jiarui Fang
27327a4c90
[example] add palm pytorch version ( #2172 )
2022-12-22 10:15:34 +08:00
Jiarui Fang
a4b4bb01d6
[example] update vit readme ( #2155 )
2022-12-20 15:56:54 +08:00
Jiarui Fang
2cfe685b9f
[exmaple] add vit missing functions ( #2154 )
2022-12-20 15:03:26 +08:00
HELSON
a7d95b7024
[example] add zero1, zero2 example in GPT examples ( #2146 )
...
* [example] add zero1 and zero2 for GPT
* update readme in gpt example
* polish code
* change init value
* update readme
2022-12-20 14:30:27 +08:00
Fazzie
cea4292ae5
support stable diffusion v2
2022-12-13 14:26:49 +08:00
ZijianYY
fa9d1aea71
[example] update GPT README ( #2095 )
2022-12-07 15:47:37 +08:00
YuliangLiu0306
edf4cd46c5
[examples] update autoparallel demo ( #2061 )
2022-12-01 18:50:58 +08:00
Super Daniel
2edbef13cc
[fx] add more meta_registry for MetaTensor execution. ( #2000 )
...
* [sc] add examples for auto checkpoint.
* merge upstream
* [fx] add more meta_registry for MetaTensor execution.
2022-11-23 10:55:46 +08:00
Fazzie-Maqianli
b5dbb46172
[example] add diffusion inference ( #1986 )
2022-11-20 18:35:29 +08:00
mandoxzhang
52bd106627
add RoBERTa ( #1980 )
...
* update roberta
* update roberta & readme
* update roberta & readme
* update roberta & readme
2022-11-18 14:04:49 +08:00
Jiarui Fang
f7e276fa71
[Gemini] add GeminiAdamOptimizer ( #1960 )
2022-11-16 14:44:28 +08:00
Jiarui Fang
60abd86d6a
[example] enhance GPT demo ( #1959 )
...
* [example] enhence GPT demo
* Update README.md
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2022-11-16 11:36:27 +08:00
Fazzie
a09f88ab07
update model download in README
2022-11-16 11:17:30 +08:00
Fazzie-Maqianli
6bdd0a90ca
update lightning version ( #1954 )
2022-11-15 16:57:48 +08:00
binmakeswell
9183e0dec5
[tutorial] polish all README ( #1946 )
2022-11-14 19:49:32 +08:00
Frank Lee
de56b563b9
[tutorial] added missing dummy dataloader ( #1944 )
2022-11-14 04:09:03 -06:00
Frank Lee
c6ea65011f
[tutorial] fixed pipeline bug for sequence parallel ( #1943 )
2022-11-14 04:06:57 -06:00
Jiarui Fang
cf68cc92ac
[example] add vit ( #1942 )
...
* [ColoTensor] ColoInitContext initialize parameters in shard mode.
* polish
* [example] add vit
2022-11-14 17:28:03 +08:00
YuliangLiu0306
c7925c5d08
[sc demo] add requirements to spmd README ( #1941 )
2022-11-14 17:22:45 +08:00
Boyuan Yao
d5f5e06d82
[SC] remove redundant hands on ( #1939 )
...
* [sc] SC tutorial for auto checkpoint
* [sc] polish examples
* [sc] polish readme
* [sc] polish readme and help information
* [sc] polish readme and help information
* [sc] modify auto checkpoint benchmark
* [sc] remove imgs
* [sc] remove redundant handson
2022-11-14 03:05:21 -06:00
binmakeswell
41868f7605
[tutorial] polish README and OPT files ( #1930 )
...
* [tutorial] polish README and OPT files
* [tutorial] polish README and OPT files
* [tutorial] polish README and OPT files
2022-11-13 13:09:58 +08:00
ver217
b0b7a786b7
[tutorial] add synthetic dataset for opt ( #1924 )
2022-11-13 03:26:11 +08:00
Frank Lee
0486048453
[tutorial] updated hybrid parallel readme ( #1928 )
...
* [tutorial] updated hybrid parallel readme
* polish code
2022-11-13 03:25:01 +08:00
Frank Lee
807cbdb87d
[tutorial] added synthetic data for sequence parallel ( #1927 )
...
* [tutorial] added synthetic data for sequence parallel
* polish code
2022-11-13 03:24:02 +08:00
Frank Lee
abf4c27f6a
[tutorial] removed huggingface model warning ( #1925 )
2022-11-12 23:12:18 +08:00
Frank Lee
d43a671ad6
Hotfix/tutorial readme index ( #1922 )
...
* [tutorial] removed tutorial index in readme
* [tutorial] removed tutorial index in readme
2022-11-12 18:24:52 +08:00
Boyuan Yao
24cbee0ebe
[tutorial] modify hands-on of auto activation checkpoint ( #1920 )
...
* [sc] SC tutorial for auto checkpoint
* [sc] polish examples
* [sc] polish readme
* [sc] polish readme and help information
* [sc] polish readme and help information
* [sc] modify auto checkpoint benchmark
* [sc] remove imgs
2022-11-12 18:21:03 +08:00
Frank Lee
ff16773ded
[tutorial] added synthetic data for hybrid parallel ( #1921 )
...
* [tutorial] added synthetic data for hybrid parallel
* polish code
2022-11-12 18:18:55 +08:00
Frank Lee
3c42fdbedc
[tutorial] added synthetic data for hybrid parallel ( #1919 )
2022-11-12 17:49:48 +08:00
Frank Lee
1b0dd05940
[tutorial] added synthetic dataset for auto parallel demo ( #1918 )
2022-11-12 17:14:32 +08:00
Frank Lee
acd9abc5ca
[tutorial] updated auto parallel demo with latest data path ( #1917 )
2022-11-12 16:55:19 +08:00
Frank Lee
d53415bc10
[tutorial] added data script and updated readme ( #1916 )
2022-11-12 16:38:41 +08:00
binmakeswell
155e202318
[example] update auto_parallel img path ( #1910 )
2022-11-11 23:43:22 +08:00
Boyuan Yao
d5c5bc219e
[SC] add GPT example for auto checkpoint ( #1889 )
...
* [sc] SC tutorial for auto checkpoint
* [sc] polish examples
* [sc] polish readme
* [sc] polish readme and help information
* [sc] polish readme and help information
2022-11-11 23:17:25 +08:00
binmakeswell
11ee8ae478
[tutorial] add cifar10 for diffusion ( #1907 )
2022-11-11 19:03:50 +08:00
Frank Lee
cb7ec714c8
[tutorial] removed duplicated tutorials ( #1904 )
2022-11-11 17:23:40 +08:00
Fazzie-Maqianli
351f0f64e6
[example] add cifar10 dadaset for diffusion ( #1902 )
...
* add cifar10 dadasets
* Update README.md
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2022-11-11 17:22:54 +08:00
BoxiangW
ca6e75bc28
[tutorial] edited hands-on practices ( #1899 )
...
* Add handson to ColossalAI.
* Change names of handsons and edit sequence parallel example.
* Edit wrong folder name
* resolve conflict
* delete readme
2022-11-11 17:08:17 +08:00
BoxiangW
d9bf83e084
Add handson to ColossalAI. ( #1896 )
...
Co-authored-by: Boxiang Wang <boxiang.wang1@gmail.com>
2022-11-11 16:13:22 +08:00
Super Daniel
6d559ea614
[sc] add examples for auto checkpoint. ( #1880 )
2022-11-10 20:50:15 +08:00
HELSON
f9e7d179f2
[diffusion] fix package conflicts ( #1875 )
2022-11-10 16:33:34 +08:00
binmakeswell
610dda676c
[example] migrate diffusion and auto_parallel hands-on ( #1871 )
2022-11-10 15:31:46 +08:00
binmakeswell
50c4cb0167
[NFC] remove redundant dependency ( #1869 )
...
* remove redundant config
* remove redundant dependency
2022-11-10 14:51:47 +08:00
binmakeswell
fd8f0ca5a8
[example] initialize tutorial ( #1865 )
2022-11-10 14:05:27 +08:00
binmakeswell
e9635eb493
add explanation specified version
2022-11-09 12:13:01 +08:00
jiaruifang
27211d6267
[example] polish diffusion readme
2022-11-09 09:38:05 +08:00
binmakeswell
4ac7d3ec3b
[doc] polish diffusion README ( #1840 )
2022-11-08 22:36:55 +08:00
Jiarui Fang
f86a703bcf
[NFC] update gitignore remove DS_Store ( #1830 )
2022-11-08 17:18:15 +08:00
Jiarui Fang
a25f755331
[example] add TP to GPT example ( #1828 )
2022-11-08 17:17:19 +08:00
Fazzie-Maqianli
6e9730d7ab
[example] add stable diffuser ( #1825 )
2022-11-08 16:14:45 +08:00
Jiarui Fang
b1263d32ba
[example] simplify the GPT2 huggingface example ( #1826 )
2022-11-08 16:14:07 +08:00
Jiarui Fang
cd5a0d56fa
[Gemini] make gemini usage simple ( #1821 )
2022-11-08 15:53:13 +08:00
Maruyama_Aya
a7e8159da6
add ColoDiffusion codes: /ldm/module/, /ldm/data/, /scripts/test/
2022-11-08 14:39:35 +08:00
Jiarui Fang
350ccc0481
[example] opt does not depend on Titans ( #1811 )
2022-11-08 12:02:20 +08:00
Jiarui Fang
203ca57aed
[example] add GPT
2022-11-08 10:58:17 +08:00
Jiarui Fang
fd2c8d8156
[example] add opt model in lauguage ( #1809 )
2022-11-08 10:39:13 +08:00
Jiarui Fang
f5a92c288c
[example] add diffusion to example ( #1805 )
2022-11-07 17:43:36 +08:00
Jiarui Fang
a19eb80998
[embedding] updates some default parameters
2022-09-15 15:45:17 +08:00
github-actions[bot]
177d3f5718
Automated submodule synchronization ( #1465 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-08-19 13:39:21 +08:00
github-actions[bot]
9b442ecdc3
Automated submodule synchronization ( #1404 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-08-08 11:24:58 +08:00
github-actions[bot]
1e5eb0874c
Automated submodule synchronization ( #1396 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-08-03 09:18:45 +08:00
github-actions[bot]
50dec605e1
Automated submodule synchronization ( #1380 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-28 11:12:52 +08:00
github-actions[bot]
fb6f085907
Automated submodule synchronization ( #1372 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-27 09:25:03 +08:00
github-actions[bot]
6160a1d6a7
Automated submodule synchronization ( #1348 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-21 10:50:27 +08:00
github-actions[bot]
6f2f9eb214
Automated submodule synchronization ( #1305 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-14 13:40:54 +08:00
github-actions[bot]
762905da68
Automated submodule synchronization ( #1241 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-12 10:32:20 +08:00
github-actions[bot]
4951f7d80c
Automated submodule synchronization ( #1204 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-07 15:22:45 +08:00
github-actions[bot]
23442a5bc1
Automated submodule synchronization ( #1194 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-07-04 10:12:17 +08:00
github-actions[bot]
6f0733a1ef
Automated submodule synchronization ( #1159 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-06-29 15:11:36 +08:00
github-actions[bot]
e8c34eedfd
Automated submodule synchronization ( #1129 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-06-22 14:39:08 +08:00
github-actions[bot]
85b58093d2
Automated submodule synchronization ( #1105 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-06-14 09:53:30 +08:00
github-actions[bot]
e32470b6de
Automated submodule synchronization ( #1049 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-06-01 11:04:32 +08:00
github-actions[bot]
4d8a574cd3
Automated submodule synchronization ( #1034 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-27 17:12:48 +08:00
github-actions[bot]
9e3d602dba
Automated submodule synchronization ( #1003 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-20 17:08:44 +08:00
github-actions[bot]
46bc95708f
Automated submodule synchronization ( #960 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-14 21:55:34 +08:00
github-actions[bot]
7edb38193a
Automated submodule synchronization ( #932 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-13 10:22:51 +08:00
github-actions[bot]
b61d64685f
Automated submodule synchronization ( #929 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-11 09:13:06 +08:00
github-actions[bot]
1cf7fb3cd9
Automated submodule synchronization ( #912 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-06 10:10:56 +08:00
github-actions[bot]
3b1f5f07ce
Automated submodule synchronization ( #907 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-05-03 13:14:48 +08:00
github-actions[bot]
f271f34716
Automated submodule synchronization ( #827 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-04-22 15:24:58 +08:00
github-actions[bot]
413ce30c45
Automated submodule synchronization ( #819 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-04-21 11:26:58 +08:00
github-actions[bot]
9aae4197bb
Automated submodule synchronization ( #810 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-04-20 13:57:12 +08:00
github-actions[bot]
6978980f6d
Automated submodule synchronization ( #751 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-04-14 15:34:01 +08:00
github-actions[bot]
d878d843ad
Automated submodule synchronization ( #695 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-04-08 10:03:53 +08:00
github-actions[bot]
d50cdabbc9
Automated submodule synchronization ( #556 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-04-07 22:11:00 +08:00
github-actions[bot]
92f4224867
Automated submodule synchronization ( #501 )
2022-03-30 14:06:23 +08:00
github-actions[bot]
353566c198
Automated submodule synchronization ( #483 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-03-22 09:34:26 +08:00
github-actions[bot]
cfcc8271f3
[Bot] Automated submodule synchronization ( #451 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-03-18 09:51:43 +08:00
github-actions
6098bc4cce
Automated submodule synchronization
2022-03-14 00:01:12 +00:00
github-actions
b9f8521f8c
Automated submodule synchronization
2022-02-15 11:35:37 +08:00
github-actions[bot]
5420809f43
Automated submodule synchronization ( #203 )
...
Co-authored-by: github-actions <github-actions@github.com>
2022-02-04 10:19:38 +08:00
Frank Lee
ca4ae52d6b
Set examples as submodule ( #162 )
...
* remove examples folder
* added examples as submodule
* update .gitmodules
2022-01-19 16:35:36 +08:00
LuGY_mac
d143396cac
Added rand augment and update the dataloader
2022-01-18 16:14:46 +08:00
HELSON
1ff5be36c2
Added moe parallel example ( #140 )
2022-01-17 15:34:04 +08:00
ver217
f03bcb359b
update vit example for new API ( #98 ) ( #99 )
2022-01-04 20:35:33 +08:00
アマデウス
0fedef4f3c
Layer integration ( #83 )
...
* integrated parallel layers for ease of building models
* integrated 2.5d layers
* cleaned codes and unit tests
* added log metric by step hook; updated imagenet benchmark; fixed some bugs
* reworked initialization; cleaned codes
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-27 15:04:32 +08:00
Xin Zhang
648f806315
add example of self-supervised SimCLR training - V2 ( #50 )
...
* add example of self-supervised SimCLR training
* simclr v2, replace nvidia dali dataloader
* updated
* sync to latest code writing style
* sync to latest code writing style and modify README
* detail README & standardize dataset path
2021-12-21 08:07:18 +08:00
Frank Lee
35813ed3c4
update examples and sphnix docs for the new api ( #63 )
2021-12-13 22:07:01 +08:00
Frank Lee
da01c234e1
Develop/experiments ( #59 )
...
* Add gradient accumulation, fix lr scheduler
* fix FP16 optimizer and adapted torch amp with tensor parallel (#18 )
* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
* fixed trainer
* Revert "fixed trainer"
This reverts commit 2e0b0b7699
.
* improved consistency between trainer, engine and schedule (#23 )
Co-authored-by: 1SAA <c2h214748@gmail.com>
* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000
* Integrate 1d tensor parallel in Colossal-AI (#39 )
* fixed 1D and 2D convergence (#38 )
* optimized 2D operations
* fixed 1D ViT convergence problem
* Feature/ddp (#49 )
* remove redundancy func in setup (#19 ) (#20 )
* use env to control the language of doc (#24 ) (#25 )
* Support TP-compatible Torch AMP and Update trainer API (#27 )
* Add gradient accumulation, fix lr scheduler
* fix FP16 optimizer and adapted torch amp with tensor parallel (#18 )
* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
* fixed trainer
* Revert "fixed trainer"
This reverts commit 2e0b0b7699
.
* improved consistency between trainer, engine and schedule (#23 )
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29 )
* add explanation for ViT example (#35 ) (#36 )
* support torch ddp
* fix loss accumulation
* add log for ddp
* change seed
* modify timing hook
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* Feature/pipeline (#40 )
* remove redundancy func in setup (#19 ) (#20 )
* use env to control the language of doc (#24 ) (#25 )
* Support TP-compatible Torch AMP and Update trainer API (#27 )
* Add gradient accumulation, fix lr scheduler
* fix FP16 optimizer and adapted torch amp with tensor parallel (#18 )
* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
* fixed trainer
* Revert "fixed trainer"
This reverts commit 2e0b0b7699
.
* improved consistency between trainer, engine and schedule (#23 )
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29 )
* add explanation for ViT example (#35 ) (#36 )
* optimize communication of pipeline parallel
* fix grad clip for pipeline
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51 )
* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset
* update api for better usability (#58 )
update api for better usability
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-09 15:08:29 +08:00
ver217
eb2f8b1f6b
add how to build tfrecord dataset ( #48 )
2021-12-02 16:31:23 +08:00
ver217
4da256a584
add some details in vit-b16 example ( #46 )
2021-12-02 09:29:27 +08:00
ver217
e67dab92a9
add some details in vit-b16 example ( #43 ) ( #44 )
2021-12-02 08:55:11 +08:00
binmakeswell
2528adc62f
add explanation for ViT example ( #35 ) ( #36 )
2021-11-29 10:25:38 +08:00
ver217
dbe62c67b8
add an example of ViT-B/16 and remove w_norm clipping in LAMB ( #29 )
2021-11-18 23:45:09 +08:00
Frank Lee
3defa32aee
Support TP-compatible Torch AMP and Update trainer API ( #27 )
...
* Add gradient accumulation, fix lr scheduler
* fix FP16 optimizer and adapted torch amp with tensor parallel (#18 )
* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
* fixed trainer
* Revert "fixed trainer"
This reverts commit 2e0b0b7699
.
* improved consistency between trainer, engine and schedule (#23 )
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
zbian
404ecbdcc6
Migrated project
2021-10-28 18:21:23 +02:00