295 Commits (cloud/coati)

Author SHA1 Message Date
LuGY 105c5301c3
[zero]added hybrid adam, removed loss scale in adam (#527) 3 years ago
LuGY 6a3f9fda83
[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497) 3 years ago
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500) 3 years ago
ver217 9ec1ce6ab1
[zero] sharded model support the reuse of fp16 shard (#495) 3 years ago
HELSON c9023d4078
[MOE] support PR-MOE (#488) 3 years ago
ver217 62b0a8d644
[zero] sharded optim support hybrid cpu adam (#486) 3 years ago
HELSON d7ea63992b
[MOE] add FP32LinearGate for MOE in NaiveAMP context (#480) 3 years ago
Jiarui Fang 65c0f380c2
[format] polish name format for MOE (#481) 3 years ago
HELSON 7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469) 3 years ago
HELSON aff9d354f7
[MOE] polish moe_env (#467) 3 years ago
HELSON bccbc15861
[MOE] changed parallelmode to dist process group (#460) 3 years ago
Jiarui Fang 0fcfb1e00d
[test] make zero engine test really work (#447) 3 years ago
Jiarui Fang 237d08e7ee
[zero] hybrid cpu adam (#445) 3 years ago
HELSON dbdc9a7783
added Multiply Jitter and capacity factor eval for MOE (#434) 3 years ago
HELSON 3f70a2b12f
removed noisy function during evaluation of MoE router (#419) 3 years ago
Jiang Zhuo 5a4a3b77d9 fix format (#376) 3 years ago
LuGY de46450461 Added activation offload (#331) 3 years ago
Kai Wang (Victor Kai) 53bb3bcc0a fix format (#362) 3 years ago
Yuer867 4a0f8c2c50 fix format parallel_2p5d (#357) 3 years ago
Liang Bowen 7eb87f516d flake8 style (#352) 3 years ago
xuqifan897 148207048e Qifan formated file ColossalAI\colossalai\nn\layer\parallel_1d\layers.py (#342) 3 years ago
DouJS cbb6436ff0 fix format for dir-[parallel_3d] (#333) 3 years ago
LuGY a3269de5c9 [zero] cpu adam kernel (#288) 3 years ago
1SAA 82023779bb Added TPExpert for special situation 3 years ago
HELSON 36b8477228 Fixed parameter initialization in FFNExpert (#251) 3 years ago
アマデウス e13293bb4c fixed CI dataset directory; fixed import error of 2.5d accuracy (#255) 3 years ago
1SAA 219df6e685 Optimized MoE layer and fixed some bugs; 3 years ago
zbian 3dba070580 fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial 3 years ago
アマデウス 9ee197d0e9 moved env variables to global variables; (#215) 3 years ago
HELSON 0f8c7f9804
Fixed docstring in colossalai (#171) 3 years ago
Frank Lee e2089c5c15
adapted for sequence parallel (#163) 3 years ago
ver217 f68eddfb3d
refactor kernel (#142) 3 years ago
BoxiangW 4a3d3446b0
Update layer integration documentations (#108) 3 years ago
HELSON dceae85195
Added MoE parallel (#127) 3 years ago
ver217 7904baf6e1
fix layers/schedule for hybrid parallelization (#111) (#112) 3 years ago
ver217 96780e6ee4
Optimize pipeline schedule (#94) 3 years ago
アマデウス 01a80cd86d
Hotfix/Colossalai layers (#92) 3 years ago
アマデウス 0fedef4f3c
Layer integration (#83) 3 years ago
HELSON 632e622de8
overlap computation and communication in 2d operations (#75) 3 years ago
Frank Lee 35813ed3c4
update examples and sphnix docs for the new api (#63) 3 years ago
Frank Lee da01c234e1
Develop/experiments (#59) 3 years ago
ver217 dbe62c67b8
add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) 3 years ago
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27) 3 years ago
ver217 3c7604ba30 update documentation 3 years ago
zbian 404ecbdcc6 Migrated project 3 years ago