ColossalAI

Commit Graph

Author	SHA1	Message	Date
LuGY	105c5301c3	[zero]added hybrid adam, removed loss scale in adam (#527 ) * [zero]added hybrid adam, removed loss scale of adam * remove useless code	3 years ago
LuGY	6a3f9fda83	[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497 )	3 years ago
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	3 years ago
ver217	9ec1ce6ab1	[zero] sharded model support the reuse of fp16 shard (#495 ) * sharded model supports reuse fp16 shard * rename variable * polish code * polish code * polish code	3 years ago
HELSON	c9023d4078	[MOE] support PR-MOE (#488 )	3 years ago
ver217	62b0a8d644	[zero] sharded optim support hybrid cpu adam (#486 ) * sharded optim support hybrid cpu adam * update unit test * polish docstring	3 years ago
HELSON	d7ea63992b	[MOE] add FP32LinearGate for MOE in NaiveAMP context (#480 )	3 years ago
Jiarui Fang	65c0f380c2	[format] polish name format for MOE (#481 )	3 years ago
HELSON	7544347145	[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469 )	3 years ago
HELSON	aff9d354f7	[MOE] polish moe_env (#467 )	3 years ago
HELSON	bccbc15861	[MOE] changed parallelmode to dist process group (#460 )	3 years ago
Jiarui Fang	0fcfb1e00d	[test] make zero engine test really work (#447 )	3 years ago
Jiarui Fang	237d08e7ee	[zero] hybrid cpu adam (#445 )	3 years ago
HELSON	dbdc9a7783	added Multiply Jitter and capacity factor eval for MOE (#434 )	3 years ago
HELSON	3f70a2b12f	removed noisy function during evaluation of MoE router (#419 )	3 years ago
Jiang Zhuo	5a4a3b77d9	fix format (#376 )	3 years ago
LuGY	de46450461	Added activation offload (#331 ) * Added activation offload * Fixed the import bug, used the pytest	3 years ago
Kai Wang (Victor Kai)	53bb3bcc0a	fix format (#362 )	3 years ago
Yuer867	4a0f8c2c50	fix format parallel_2p5d (#357 )	3 years ago
Liang Bowen	7eb87f516d	flake8 style (#352 )	3 years ago
xuqifan897	148207048e	Qifan formated file ColossalAI\colossalai\nn\layer\parallel_1d\layers.py (#342 )	3 years ago
DouJS	cbb6436ff0	fix format for dir-[parallel_3d] (#333 )	3 years ago
LuGY	a3269de5c9	[zero] cpu adam kernel (#288 ) * Added CPU Adam * finished the cpu adam * updated the license * delete useless parameters, removed resnet * modified the method off cpu adam unittest * deleted some useless codes * removed useless codes Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: jiaruifang <fangjiarui123@gmail.com>	3 years ago
1SAA	82023779bb	Added TPExpert for special situation	3 years ago
HELSON	36b8477228	Fixed parameter initialization in FFNExpert (#251 )	3 years ago
アマデウス	e13293bb4c	fixed CI dataset directory; fixed import error of 2.5d accuracy (#255 )	3 years ago
1SAA	219df6e685	Optimized MoE layer and fixed some bugs; Decreased moe tests; Added FFNExperts and ViTMoE model	3 years ago
zbian	3dba070580	fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial	3 years ago
アマデウス	9ee197d0e9	moved env variables to global variables; (#215 ) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs	3 years ago
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	3 years ago
Frank Lee	e2089c5c15	adapted for sequence parallel (#163 )	3 years ago
ver217	f68eddfb3d	refactor kernel (#142 )	3 years ago
BoxiangW	4a3d3446b0	Update layer integration documentations (#108 ) Update the documentations of layer integration Update _log_hook.py Update _operation.py	3 years ago
HELSON	dceae85195	Added MoE parallel (#127 )	3 years ago
ver217	7904baf6e1	fix layers/schedule for hybrid parallelization (#111 ) (#112 )	3 years ago
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	3 years ago
アマデウス	01a80cd86d	Hotfix/Colossalai layers (#92 ) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago
アマデウス	0fedef4f3c	Layer integration (#83 ) * integrated parallel layers for ease of building models * integrated 2.5d layers * cleaned codes and unit tests * added log metric by step hook; updated imagenet benchmark; fixed some bugs * reworked initialization; cleaned codes Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago
HELSON	632e622de8	overlap computation and communication in 2d operations (#75 )	3 years ago
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	3 years ago
Frank Lee	da01c234e1	Develop/experiments (#59 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago
ver217	dbe62c67b8	add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29 )	3 years ago
Frank Lee	3defa32aee	Support TP-compatible Torch AMP and Update trainer API (#27 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com>	3 years ago
ver217	3c7604ba30	update documentation	3 years ago
zbian	404ecbdcc6	Migrated project	3 years ago

... 2 3 4 5 6

295 Commits (cloud/coati)