ColossalAI

Commit Graph

Author	SHA1	Message	Date
Geng Zhang	0e06f62160	[NFC] polish colossalai/nn/layer/parallel_sequence/_operation.py code style (#1266 )	2 years ago
superhao1995	f660152c73	[NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style (#1258 ) Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>	2 years ago
Thunderbeee	9738fb0f78	[NFC] polish colossalai/nn/lr_scheduler/__init__.py (#1255 ) code style	2 years ago
Ofey Chan	2dd4d556fb	[NFC] polish colossalai/nn/init.py code style (#1292 )	2 years ago
HELSON	abba4d84e1	[hotfix] fix bert model test in unitests (#1272 )	2 years ago
oahzxl	0cf8e8e91c	[NFC] polish <colossalai/nn/lr_scheduler/poly.py> code style (#1267 )	2 years ago
Jiarui Fang	1aad903c15	[tensor] redistribute among different process groups (#1247 ) * make it faster * [tensor] rename convert_to_dist -> redistribute * [tensor] ShardSpec and ReplicaSpec * [tensor] redistribute among diff pgs * polish code	2 years ago
Jiarui Fang	9bcd2fd4af	[tensor] a shorter shard and replicate spec (#1245 )	2 years ago
Jiarui Fang	2699dfbbfd	[rename] convert_to_dist -> redistribute (#1243 )	2 years ago
Jiarui Fang	4a76084dc9	[tensor] add zero_like colo op, important for Optimizer (#1236 )	2 years ago
Jiarui Fang	3b500984b1	[tensor] fix some unittests (#1234 )	2 years ago
HELSON	0453776def	[tensor] fix a assertion in colo_tensor cross_entropy (#1232 )	2 years ago
HELSON	42ab36b762	[tensor] add unitest for colo_tensor 1DTP cross_entropy (#1230 )	2 years ago
Yi Zhao	04537bf83e	[checkpoint]support generalized scheduler (#1222 )	2 years ago
Jiarui Fang	a98319f023	[tensor] torch function return colotensor (#1229 )	2 years ago
Jiarui Fang	ae7d3f4927	[refactor] move process group from _DistSpec to ColoTensor. (#1203 )	2 years ago
Jiarui Fang	b5f25eb32a	[Tensor] add cpu group to ddp (#1200 )	2 years ago
Jiarui Fang	060b917daf	[refactor] remove gpc dependency in colotensor's _ops (#1189 )	2 years ago
Jiarui Fang	372f791444	[refactor] move chunk and chunkmgr to directory gemini (#1182 )	2 years ago
ver217	6b2f2ab9bb	[ddp] ColoDDP uses bucket all-reduce (#1177 ) * add reducer * update colo ddp with reducer * polish unit test * polish unit test	2 years ago
Jiarui Fang	1b657f9ce1	[tensor] revert local view back (#1178 )	2 years ago
Jiarui Fang	0dd4e2bbfb	[Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176 )	2 years ago
Ziyue Jiang	dd0420909f	[Tensor] rename parallel_action (#1174 ) * rename parallel_action * polish	2 years ago
Jiarui Fang	aa7bef73d4	[Tensor] distributed view supports inter-process hybrid parallel (#1169 )	2 years ago
Jiarui Fang	4b9bba8116	[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168 )	2 years ago
Jiarui Fang	f4ef224358	[Tensor] remove ParallelAction, use ComputeSpec instread (#1166 )	2 years ago
Jiarui Fang	177c374401	remove gather out in parallel action (#1163 )	2 years ago
Ziyue Jiang	955ac912de	remove log (#1160 )	2 years ago
Jiarui Fang	07f9c781f9	[graph] improve the graph building. (#1157 )	2 years ago
ver217	22717a856f	[tensor] add embedding bag op (#1156 )	2 years ago
ver217	ae86151968	[tensor] add more element-wise ops (#1155 ) * add more element-wise ops * update test_op * polish unit test	2 years ago
ver217	54aabb8da4	[gemini] refactor gemini mgr (#1151 ) * refactor gemini mgr * udpate __init__	2 years ago
ver217	8106d7b8c7	[ddp] refactor ColoDDP and ZeroDDP (#1146 ) * ColoDDP supports overwriting default process group * rename ColoDDPV2 to ZeroDDP * add docstr for ZeroDDP * polish docstr	2 years ago
ver217	ccf3c58c89	embedding op use gather_out (#1143 )	2 years ago
Frank Lee	15aab1476e	[zero] avoid zero hook spam by changing log to debug level (#1137 )	2 years ago
ver217	e4f555f29a	[optim] refactor fused sgd (#1134 )	2 years ago
ver217	d26902645e	[ddp] add save/load state dict for ColoDDP (#1127 ) * add save/load state dict for ColoDDP * add unit test * refactor unit test folder * polish unit test * rename unit test	2 years ago
ver217	f0a954f16d	[ddp] add set_params_to_ignore for ColoDDP (#1122 ) * add set_params_to_ignore for ColoDDP * polish code * fix zero hook v2 * add unit test * polish docstr	3 years ago
ver217	e127b4375b	cast colo ddp v2 inputs/outputs (#1120 )	3 years ago
ver217	7d14b473f0	[gemini] gemini mgr supports "cpu" placement policy (#1118 ) * update gemini mgr * update chunk * add docstr * polish placement policy * update test chunk * update test zero * polish unit test * remove useless unit test	3 years ago
ver217	895c1c5ee7	[tensor] refactor param op hook (#1097 ) * refactor param op hook * add docstr * fix bug	3 years ago
Frank Lee	cb18922c47	[doc] added documentation to chunk and chunk manager (#1094 ) * [doc] added documentation to chunk and chunk manager * polish code * polish code * polish code	3 years ago
ver217	1f894e033f	[gemini] zero supports gemini (#1093 ) * add placement policy * add gemini mgr * update mem stats collector * update zero * update zero optim * fix bugs * zero optim monitor os * polish unit test * polish unit test * add assert	3 years ago
Frank Lee	2b2dc1c86b	[pipeline] refactor the pipeline module (#1087 ) * [pipeline] refactor the pipeline module * polish code	3 years ago
ver217	be01db37c8	[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077 ) * polish chunk manager * polish unit test * impl add_extern_static_tensor for chunk mgr * add mem stats collector v2 * polish code * polish unit test * polish code * polish get chunks	3 years ago
Ziyue Jiang	0653c63eaa	[Tensor] 1d row embedding (#1075 ) * Add CPU 1d row embedding * polish	3 years ago
Ziyue Jiang	4fc748f69b	[Tensor] fix optimizer for CPU parallel (#1069 )	3 years ago
Jiarui Fang	49832b2344	[refactory] add nn.parallel module (#1068 )	3 years ago
Ziyue Jiang	6754f1b77f	fix module utils bug (#1066 )	3 years ago
Jiarui Fang	a00644079e	reorgnize colotensor directory (#1062 ) * reorgnize colotensor directory * polish code	3 years ago
Ziyue Jiang	df9dcbbff6	[Tensor] add hybrid device demo and fix bugs (#1059 )	3 years ago
ver217	51b9a49655	[zero] add zero optimizer for ColoTensor (#1046 ) * add zero optimizer * torch ok * unit test ok * polish code * fix bugs * polish unit test * polish zero optim * polish colo ddp v2 * refactor folder structure * add comment * polish unit test * polish zero optim * polish unit test	3 years ago
ver217	9492a561c3	[tensor] ColoTensor supports ZeRo (#1015 ) * impl chunk manager * impl param op hook * add reduce_chunk * add zero hook v2 * add zero dp * fix TensorInfo * impl load balancing when using zero without chunk * fix zero hook * polish chunk * fix bugs * ddp ok * zero ok * polish code * fix bugs about load balancing * polish code * polish code * add ene-to-end test * polish code * polish code * polish code * fix typo * add test_chunk * fix bugs * fix bugs * polish code	3 years ago
ver217	cefc29ff06	[tensor] impl ColoDDP for ColoTensor (#1009 ) * impl ColoDDP for ColoTensor * polish code	3 years ago
Ziheng Qin	571f12eff3	[NFC] polish colossalai/nn/layer/utils/common.py code style (#983 )	3 years ago
shenggan	18542b47fc	[NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976 )	3 years ago
Zirui Zhu	598cde4a0f	[NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972 )	3 years ago
LuGY	fb5bc6cb28	[NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966 )	3 years ago
ver217	58580b50fe	Revert "[NFC] Hotfix/format (#984 )" (#986 ) This reverts commit `0772828fba`.	3 years ago
binmakeswell	0772828fba	[NFC] Hotfix/format (#984 ) * [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#939) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#936) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style (#938) * [NFC] polish moe_cuda_kernel.cu code style (#940) Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> * [NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943) * [NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style (#942) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style (#945) * [NFC] polish colossalai/kernel/jit/bias_gelu.py code style (#946) Co-authored-by: jnbai <897086360@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949) Co-authored-by: Jiatong <jiatong.han@u.nus.edu> * [NFC] polish colossalai/builder/pipeline.py code style (#951) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp code style (#952) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cross_entropy.cu code style (#953) Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/softmax_kernels.cu code style (#954) * [NFC] polish colossalai/kernel/cuda_native/scaled_softmax.py code style (#955) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/context.h code style (#956) Co-authored-by: RichardoLuo <14049555596@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h code style (#957) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style (#958) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h code style (#962) * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style (#959) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu code style (#963) Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h code style (#964) * [NFC] polish __init__.py code style (#965) * [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h (#968) code style * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style (#970) * [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972) * [NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style (#973) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style (#974) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style (#977) * [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style (#978) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#979) * [NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980) * [NFC] polish colossalai/nn/layer/utils/common.py code style (#983) Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> Co-authored-by: yuxuan-lou <83441848+yuxuan-lou@users.noreply.github.com> Co-authored-by: Geng Zhang <34452939+zxgx@users.noreply.github.com> Co-authored-by: Maruyama_Aya <38985202+MaruyamaAya@users.noreply.github.com> Co-authored-by: XYE <92607131+Itok2000u@users.noreply.github.com> Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> Co-authored-by: HaoyuQin <79465534+coder-chin@users.noreply.github.com> Co-authored-by: wky <64853922+wangkuangyi@users.noreply.github.com> Co-authored-by: bajiaoyu517 <59548007+bajiaoyu517@users.noreply.github.com> Co-authored-by: luoling-LC <105470086+luoling-LC@users.noreply.github.com> Co-authored-by: jnbai <897086360@qq.com> Co-authored-by: JT.Han <59948448+JThh@users.noreply.github.com> Co-authored-by: Jiatong <jiatong.han@u.nus.edu> Co-authored-by: xyupeng <99191637+xyupeng@users.noreply.github.com> Co-authored-by: Sze-qq <68757353+Sze-qq@users.noreply.github.com> Co-authored-by: Cautiousss <48676630+Cautiousss@users.noreply.github.com> Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> Co-authored-by: Luxios22 <67457897+Luxios22@users.noreply.github.com> Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com> Co-authored-by: RichardoLuo <50363844+RichardoLuo@users.noreply.github.com> Co-authored-by: RichardoLuo <14049555596@qq.com> Co-authored-by: doubleHU <98150031+huxin711@users.noreply.github.com> Co-authored-by: runluo <68489000+run-qiao@users.noreply.github.com> Co-authored-by: MaxT <854721132@qq.com> Co-authored-by: superhao1995 <804673818@qq.com> Co-authored-by: ziyu huang <huang0ziyu@gmail.com> Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> Co-authored-by: Yuer867 <62204893+Yuer867@users.noreply.github.com> Co-authored-by: lucasliunju <lucasliunju@gmail.com> Co-authored-by: LuGY <74758262+Gy-Lu@users.noreply.github.com> Co-authored-by: ExtremeViscent <zhangyiqi55732@sina.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Zirui Zhu <zhuzr21@gmail.com> Co-authored-by: Ofey Chan <ofey206@gmail.com> Co-authored-by: DouJS <dujiangsu@163.com> Co-authored-by: Jie Zhu <chore.08-protist@icloud.com> Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: Kai Wang (Victor Kai) <37533040+kaiwang960112@users.noreply.github.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: Ziheng Qin <37519855+henryqin1997@users.noreply.github.com>	3 years ago
HELSON	e5ea3fdeef	[gemini] add GeminiMemoryManger (#832 ) * refactor StatefulTensor, tensor utilities * add unitest for GeminiMemoryManager	3 years ago
Ziyue Jiang	4b01da24cd	[TP] change the check assert in split batch 2d (#772 )	3 years ago
アマデウス	b8899e0905	[TP] allow layernorm without bias (#750 )	3 years ago
Frank Lee	eda30a058e	[compatibility] fixed tensor parallel compatibility with torch 1.9 (#700 )	3 years ago
HELSON	a9b8300d54	[zero] improve adaptability for not-shard parameters (#708 ) * adapt post grad hooks for not-shard parameters * adapt optimizer for not-shard parameters * offload gradients for not-replicated parameters	3 years ago
アマデウス	3fc8a204dc	[]Corrected 3d vocab parallel embedding (#707 )	3 years ago
HELSON	b31daed4cf	fix bugs in CPU adam (#633 ) * add cpu adam counter for all cpu adam * fixed updating error in adam kernel	3 years ago
Liang Bowen	828e465622	[hotfix] Raise messages for indivisible batch sizes with tensor parallelism (#622 )	3 years ago
アマデウス	77ad24bf94	[model checkpoint] updated saving/loading for 3d layers (#597 )	3 years ago
アマデウス	93089ed708	[model checkpoint] updated saving/loading for 2.5d layers (#596 )	3 years ago
アマデウス	c50bfb807b	[model checkpoint] updated saving/loading for 1d layers (#594 )	3 years ago
アマデウス	7636d518e1	[model checkpoint] updated saving/loading for 2d layers (#595 )	3 years ago
アマデウス	cd13b63832	[model checkpoint] reworked unified layers for ease of save/load states (#593 )	3 years ago
Ziyue Jiang	1c40ee8749	[TP] add assert for tp1d (#621 )	3 years ago
ver217	e619a651fb	polish optimizer docstring (#619 )	3 years ago
ver217	8432dc7080	polish moe docsrting (#618 )	3 years ago
ver217	104cbbb313	[hotfix] add hybrid adam to __init__ (#584 )	3 years ago
HELSON	e6d50ec107	[zero] adapt zero for unsharded parameters (#561 ) * support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler	3 years ago
Wesley	46c9ba33da	update code format	3 years ago
Wesley	666cfd094a	fix parallel_input flag for Linear1D_Col gather_output	3 years ago
Liang Bowen	2c45efc398	html refactor (#555 )	3 years ago
LuGY	c44d797072	[docs] updatad docs of hybrid adam and cpu adam (#552 )	3 years ago
Ziyue Jiang	763dc325f1	[TP] Add gather_out arg to Linear (#541 )	3 years ago
HELSON	8c90d4df54	[zero] add zero context manager to change config during initialization (#546 )	3 years ago
Liang Bowen	ec5086c49c	Refactored docstring to google style	3 years ago
LuGY	105c5301c3	[zero]added hybrid adam, removed loss scale in adam (#527 ) * [zero]added hybrid adam, removed loss scale of adam * remove useless code	3 years ago
LuGY	6a3f9fda83	[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497 )	3 years ago
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	3 years ago
ver217	9ec1ce6ab1	[zero] sharded model support the reuse of fp16 shard (#495 ) * sharded model supports reuse fp16 shard * rename variable * polish code * polish code * polish code	3 years ago
HELSON	c9023d4078	[MOE] support PR-MOE (#488 )	3 years ago
ver217	62b0a8d644	[zero] sharded optim support hybrid cpu adam (#486 ) * sharded optim support hybrid cpu adam * update unit test * polish docstring	3 years ago
HELSON	d7ea63992b	[MOE] add FP32LinearGate for MOE in NaiveAMP context (#480 )	3 years ago
Jiarui Fang	65c0f380c2	[format] polish name format for MOE (#481 )	3 years ago
HELSON	7544347145	[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469 )	3 years ago
HELSON	aff9d354f7	[MOE] polish moe_env (#467 )	3 years ago
HELSON	bccbc15861	[MOE] changed parallelmode to dist process group (#460 )	3 years ago
Jiarui Fang	0fcfb1e00d	[test] make zero engine test really work (#447 )	3 years ago
Jiarui Fang	237d08e7ee	[zero] hybrid cpu adam (#445 )	3 years ago
HELSON	dbdc9a7783	added Multiply Jitter and capacity factor eval for MOE (#434 )	3 years ago
HELSON	3f70a2b12f	removed noisy function during evaluation of MoE router (#419 )	3 years ago

1 2 3 4

180 Commits (7d49e7b2dbdb4b966496475654a4154b92aeaa7b)