ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	ae02d4e4f7	[bf16] add bf16 support (#3882 ) * [bf16] add bf16 support for fused adam (#3844) * [bf16] fused adam kernel support bf16 * [test] update fused adam kernel test * [test] update fused adam test * [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860) * [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869) * [bf16] add mixed precision mixin * [bf16] low level zero optim support bf16 * [text] update low level zero test * [text] fix low level zero grad acc test * [bf16] add bf16 support for gemini (#3872) * [bf16] gemini support bf16 * [test] update gemini bf16 test * [doc] update gemini docstring * [bf16] add bf16 support for plugins (#3877) * [bf16] add bf16 support for legacy zero (#3879) * [zero] init context support bf16 * [zero] legacy zero support bf16 * [test] add zero bf16 test * [doc] add bf16 related docstring for legacy zero	1 year ago
digger yu	32f81f14d4	[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756 )	2 years ago
lucasliunju	4b95464994	[NFC] polish colossalai/amp/__init__.py code style (#3272 )	2 years ago
Frank Lee	8518263b80	[test] fixed the triton version for testing (#2608 )	2 years ago
HELSON	077a5cdde4	[zero] fix gradient clipping in hybrid parallelism (#2521 ) * [zero] fix gradient clipping in hybrid parallelism * [testing] change model name to avoid pytest warning * [hotfix] fix unit testing	2 years ago
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
xyupeng	b965585d05	[NFC] polish colossalai/amp/torch_amp/torch_amp.py code style (#2290 )	2 years ago
Ziheng Qin	3041014089	[NFC] polish colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py code style (#2299 ) Co-authored-by: henryqin1997 <henryqin1997@gamil.com>	2 years ago
HELSON	5d3a2be3af	[amp] add gradient clipping for unit tests (#2283 ) * [amp] add gradient clipping in unit tests * fix bugs	2 years ago
YuliangLiu0306	f027ef7913	[hotfix] fix fp16 optimzier bug (#2273 )	2 years ago
Jiarui Fang	355ffb386e	[builder] unified cpu_optim fused_optim inferface (#2190 )	2 years ago
Jiarui Fang	d42afd30f8	[builder] runtime adam and fused_optim builder (#2184 )	2 years ago
ver217	f8a7148dec	[kernel] move all symlinks of kernel to `colossalai._C` (#1971 )	2 years ago
Junming Wu	14a0b18305	[NFC] polish colossalai/amp/naive_amp/__init__.py code style (#1905 )	2 years ago
LuGY	94329fc139	[NFC] polish colossalai/amp/apex_amp/__init__.py code style (#1853 )	2 years ago
zbian	1559a09fb7	[NFC] polish amp.naive_amp.grad_scaler code style	2 years ago
Genghan Zhang	b25030cc07	[NFC] polish ./colossalai/amp/torch_amp/__init__.py code style (#1836 )	2 years ago
Ziyue Jiang	5da03c936d	[NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Fazzie-Maqianli	399f84d8f6	[NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819 )	2 years ago
CsRic	9623ec1b02	[NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816 ) * [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) * [NFC] polish colossalai/zero/sharded_param/__init__.py code style * [NFC] polish colossalai/amp/naive_amp/_utils.py code style Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2 years ago
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2 years ago
YuliangLiu0306	e27645376d	[hotfix]different overflow status lead to communication stuck. (#1175 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [hotfix]fix some bugs caused by refactored schedule. * [hotfix]different overflow statu llead to communication stuck.	2 years ago
Frank Lee	72bd7c696b	[amp] included dict for type casting of model output (#1102 )	2 years ago
Frank Lee	9fdebadd69	[doc] improved docstring in the amp module (#857 )	3 years ago
HELSON	4c4388c46e	[hotfix] fix memory leak in zero (#781 )	3 years ago
Frank Lee	a4e91bc87f	[bug] fixed grad scaler compatibility with torch 1.8 (#735 )	3 years ago
Jiarui Fang	4d90a7b513	[refactor] zero directory (#724 )	3 years ago
Kai Wang (Victor Kai)	b0f708dfc1	fix format (#570 )	3 years ago
ver217	c5b488edf8	polish amp docstring (#616 )	3 years ago
Liang Bowen	2c45efc398	html refactor (#555 )	3 years ago
Liang Bowen	ec5086c49c	Refactored docstring to google style	3 years ago
Jiarui Fang	496cbb0760	[hotfix] fix initialize bug with zero (#442 )	3 years ago
Frank Lee	14a7094243	fixed fp16 optimizer none grad bug (#432 )	3 years ago
Frank Lee	e79ea44247	[fp16] refactored fp16 optimizer (#392 )	3 years ago
Kai Wang (Victor Kai)	53bb3bcc0a	fix format (#362 )	3 years ago
Frank Lee	3d5d64bd10	refactored grad scaler (#338 )	3 years ago
Frank Lee	6a3188167c	set criterion as optional in colossalai initialize (#336 )	3 years ago
Frank Lee	e17e54e32a	added buffer sync to naive amp model wrapper (#291 )	3 years ago
Frank Lee	f5ca88ec97	fixed apex import (#227 )	3 years ago
アマデウス	9ee197d0e9	moved env variables to global variables; (#215 ) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs	3 years ago
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	3 years ago
Frank Lee	e2089c5c15	adapted for sequence parallel (#163 )	3 years ago
puck_WCR	9473a1b9c8	AMP docstring/markdown update (#160 )	3 years ago
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	3 years ago
ver217	8f02a88db2	add interleaved pipeline, fix naive amp and update pipeline model initializer (#80 )	3 years ago
Frank Lee	91c327cb44	fixed zero level 3 dtype bug (#76 )	3 years ago
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	3 years ago
Frank Lee	da01c234e1	Develop/experiments (#59 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago

48 Commits (feature/stable-diffusion)