ColossalAI

Commit Graph

Author	SHA1	Message	Date
HELSON	b31daed4cf	fix bugs in CPU adam (#633 ) * add cpu adam counter for all cpu adam * fixed updating error in adam kernel	2022-04-02 17:04:05 +08:00
Liang Bowen	828e465622	[hotfix] Raise messages for indivisible batch sizes with tensor parallelism (#622 )	2022-04-02 16:12:04 +08:00
アマデウス	77ad24bf94	[model checkpoint] updated saving/loading for 3d layers (#597 )	2022-04-01 16:52:47 +08:00
アマデウス	93089ed708	[model checkpoint] updated saving/loading for 2.5d layers (#596 )	2022-04-01 16:52:33 +08:00
アマデウス	c50bfb807b	[model checkpoint] updated saving/loading for 1d layers (#594 )	2022-04-01 16:51:52 +08:00
アマデウス	7636d518e1	[model checkpoint] updated saving/loading for 2d layers (#595 )	2022-04-01 16:50:34 +08:00
アマデウス	cd13b63832	[model checkpoint] reworked unified layers for ease of save/load states (#593 )	2022-04-01 16:49:56 +08:00
Ziyue Jiang	1c40ee8749	[TP] add assert for tp1d (#621 )	2022-04-01 16:44:23 +08:00
ver217	e619a651fb	polish optimizer docstring (#619 )	2022-04-01 16:27:03 +08:00
ver217	8432dc7080	polish moe docsrting (#618 )	2022-04-01 16:15:36 +08:00
ver217	104cbbb313	[hotfix] add hybrid adam to __init__ (#584 )	2022-03-31 19:08:34 +08:00
HELSON	e6d50ec107	[zero] adapt zero for unsharded parameters (#561 ) * support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler	2022-03-31 18:34:11 +08:00
Wesley	46c9ba33da	update code format	2022-03-31 17:15:08 +08:00
Wesley	666cfd094a	fix parallel_input flag for Linear1D_Col gather_output	2022-03-31 17:15:08 +08:00
Liang Bowen	2c45efc398	html refactor (#555 )	2022-03-31 11:36:56 +08:00
LuGY	c44d797072	[docs] updatad docs of hybrid adam and cpu adam (#552 )	2022-03-30 18:14:59 +08:00
Ziyue Jiang	763dc325f1	[TP] Add gather_out arg to Linear (#541 )	2022-03-30 09:35:46 +08:00
HELSON	8c90d4df54	[zero] add zero context manager to change config during initialization (#546 )	2022-03-29 17:57:59 +08:00
Liang Bowen	ec5086c49c	Refactored docstring to google style	2022-03-29 17:17:47 +08:00
LuGY	105c5301c3	[zero]added hybrid adam, removed loss scale in adam (#527 ) * [zero]added hybrid adam, removed loss scale of adam * remove useless code	2022-03-25 18:03:54 +08:00
LuGY	6a3f9fda83	[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497 )	2022-03-25 14:15:53 +08:00
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	2022-03-23 18:03:39 +08:00
ver217	9ec1ce6ab1	[zero] sharded model support the reuse of fp16 shard (#495 ) * sharded model supports reuse fp16 shard * rename variable * polish code * polish code * polish code	2022-03-23 14:59:59 +08:00
HELSON	c9023d4078	[MOE] support PR-MOE (#488 )	2022-03-22 16:48:22 +08:00
ver217	62b0a8d644	[zero] sharded optim support hybrid cpu adam (#486 ) * sharded optim support hybrid cpu adam * update unit test * polish docstring	2022-03-22 14:56:59 +08:00
HELSON	d7ea63992b	[MOE] add FP32LinearGate for MOE in NaiveAMP context (#480 )	2022-03-22 10:50:20 +08:00
Jiarui Fang	65c0f380c2	[format] polish name format for MOE (#481 )	2022-03-21 23:19:47 +08:00
HELSON	7544347145	[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469 )	2022-03-21 13:35:04 +08:00
HELSON	aff9d354f7	[MOE] polish moe_env (#467 )	2022-03-19 15:36:25 +08:00
HELSON	bccbc15861	[MOE] changed parallelmode to dist process group (#460 )	2022-03-19 13:46:29 +08:00
Jiarui Fang	0fcfb1e00d	[test] make zero engine test really work (#447 )	2022-03-17 17:24:25 +08:00
Jiarui Fang	237d08e7ee	[zero] hybrid cpu adam (#445 )	2022-03-17 15:05:41 +08:00
HELSON	dbdc9a7783	added Multiply Jitter and capacity factor eval for MOE (#434 )	2022-03-16 16:47:44 +08:00
HELSON	3f70a2b12f	removed noisy function during evaluation of MoE router (#419 )	2022-03-15 12:06:09 +08:00
Jiang Zhuo	5a4a3b77d9	fix format (#376 )	2022-03-11 15:50:28 +08:00
LuGY	de46450461	Added activation offload (#331 ) * Added activation offload * Fixed the import bug, used the pytest	2022-03-11 15:50:28 +08:00
Kai Wang (Victor Kai)	53bb3bcc0a	fix format (#362 )	2022-03-11 15:50:28 +08:00
Yuer867	4a0f8c2c50	fix format parallel_2p5d (#357 )	2022-03-11 15:50:28 +08:00
Liang Bowen	7eb87f516d	flake8 style (#352 )	2022-03-11 15:50:28 +08:00
xuqifan897	148207048e	Qifan formated file ColossalAI\colossalai\nn\layer\parallel_1d\layers.py (#342 )	2022-03-11 15:50:28 +08:00
DouJS	cbb6436ff0	fix format for dir-[parallel_3d] (#333 )	2022-03-11 15:50:28 +08:00
LuGY	a3269de5c9	[zero] cpu adam kernel (#288 ) * Added CPU Adam * finished the cpu adam * updated the license * delete useless parameters, removed resnet * modified the method off cpu adam unittest * deleted some useless codes * removed useless codes Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: jiaruifang <fangjiarui123@gmail.com>	2022-03-11 15:50:28 +08:00
1SAA	82023779bb	Added TPExpert for special situation	2022-03-11 15:50:28 +08:00
HELSON	36b8477228	Fixed parameter initialization in FFNExpert (#251 )	2022-03-11 15:50:28 +08:00
アマデウス	e13293bb4c	fixed CI dataset directory; fixed import error of 2.5d accuracy (#255 )	2022-03-11 15:50:28 +08:00
1SAA	219df6e685	Optimized MoE layer and fixed some bugs; Decreased moe tests; Added FFNExperts and ViTMoE model	2022-03-11 15:50:28 +08:00
zbian	3dba070580	fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial	2022-03-11 15:50:28 +08:00
アマデウス	9ee197d0e9	moved env variables to global variables; (#215 ) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs	2022-02-15 11:31:13 +08:00
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	2022-01-21 10:44:30 +08:00
Frank Lee	e2089c5c15	adapted for sequence parallel (#163 )	2022-01-20 13:44:51 +08:00
ver217	f68eddfb3d	refactor kernel (#142 )	2022-01-13 16:47:17 +08:00
BoxiangW	4a3d3446b0	Update layer integration documentations (#108 ) Update the documentations of layer integration Update _log_hook.py Update _operation.py	2022-01-10 18:05:58 +08:00
HELSON	dceae85195	Added MoE parallel (#127 )	2022-01-07 15:08:36 +08:00
ver217	7904baf6e1	fix layers/schedule for hybrid parallelization (#111 ) (#112 )	2022-01-04 20:52:31 +08:00
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	2021-12-30 15:56:46 +08:00
アマデウス	01a80cd86d	Hotfix/Colossalai layers (#92 ) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	2021-12-29 23:32:10 +08:00
アマデウス	0fedef4f3c	Layer integration (#83 ) * integrated parallel layers for ease of building models * integrated 2.5d layers * cleaned codes and unit tests * added log metric by step hook; updated imagenet benchmark; fixed some bugs * reworked initialization; cleaned codes Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	2021-12-27 15:04:32 +08:00
HELSON	632e622de8	overlap computation and communication in 2d operations (#75 )	2021-12-16 16:05:15 +08:00
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	2021-12-13 22:07:01 +08:00
Frank Lee	da01c234e1	Develop/experiments (#59 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	2021-12-09 15:08:29 +08:00
ver217	dbe62c67b8	add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29 )	2021-11-18 23:45:09 +08:00
Frank Lee	3defa32aee	Support TP-compatible Torch AMP and Update trainer API (#27 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com>	2021-11-18 19:45:06 +08:00
ver217	3c7604ba30	update documentation	2021-10-29 09:29:20 +08:00
zbian	404ecbdcc6	Migrated project	2021-10-28 18:21:23 +02:00

... 3 4 5 6 7

314 Commits (785cd9a9c971aa58e6f8c76575111a4aa4d9513b)