ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jiarui Fang	c638bec028	[embedding] polish async copy (#1657 )	2 years ago
Jiarui Fang	988570e4a6	[embedding] add more detail profiling (#1656 )	2 years ago
Jiarui Fang	e1f97fd2b8	[embedding] print profiling results (#1654 )	2 years ago
Jiarui Fang	04443605a5	[embedding] non-blocking cpu-gpu copy (#1647 )	2 years ago
CsRic	0767f67a0f	[embedding] isolate cache_op from forward (#1645 ) Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2 years ago
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
Jiarui Fang	e57df80325	[embeddings] cache option (#1635 )	2 years ago
HELSON	a088022efc	[moe] fix moe bugs (#1633 )	2 years ago
HELSON	f7f2248771	[moe] fix MoE bugs (#1628 ) * remove forced FP32 modules * correct no_shard-contexts' positions	2 years ago
Jiarui Fang	38c68b5b9a	[embedding] rollback for better FAW performance (#1625 )	2 years ago
Jiarui Fang	504ff1d101	[embeddings] use cache_ratio instead of cuda_row_num (#1611 )	2 years ago
Jiarui Fang	a19eb80998	[embedding] updates some default parameters	2 years ago
CsRic	f3403ff98e	[embeddings] add already_split_along_rank flag for tablewise mode (#1584 )	2 years ago
Sze-qq	2144cbae8c	[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572 )	2 years ago
superhao1995	e4bf7ae667	[NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571 ) Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>	2 years ago
Jiatong Han	3263cdf57f	[NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570 ) Co-authored-by: JThh <jiatong.han@u.nus.edu>	2 years ago
DouJS	f586887a90	[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568 )	2 years ago
BigOneLiXiaoMing	0c4c9aa6e0	[NFC] polish colossalai/nn/_ops/embedding.py code style (#1561 )	2 years ago
Ofey Chan	7cc052f6c0	[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556 )	2 years ago
yuxuan-lou	413f9c19f4	[NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555 )	2 years ago
shenggan	8edb777cc2	[NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553 )	2 years ago
Maruyama_Aya	bd2d789832	[NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552 )	2 years ago
binmakeswell	73e9eb13b7	[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style	2 years ago
CsRic	a389ac4ec9	[embedding] cache_embedding small improvement (#1564 )	2 years ago
ver217	10dd8226b1	add gather_output for VocabParallelClassifier1D (#1569 )	2 years ago
ver217	ae71036cd2	[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548 ) * refactor parallel layer * broadcast rank0 model after load ckpt	2 years ago
Jiarui Fang	64169f3e8f	[embedding] polish parallel embedding tablewise (#1545 )	2 years ago
CsRic	964123ae0f	[embedding] freq_aware_embedding: add small functions for caller application (#1537 )	2 years ago
Jiarui Fang	521078ffc9	[embedding] fix a bug in table wise sharding (#1538 )	2 years ago
Jiarui Fang	87134524fd	[embedding] tablewise sharding polish (#1535 )	2 years ago
CsRic	5156d5b4f8	[embedding] add tablewise sharding for FAW (#1526 )	2 years ago
Jiarui Fang	4537d39df9	[doc] docstring for FreqAwareEmbeddingBag (#1525 )	2 years ago
Jiarui Fang	9a9ef65313	[FAW] cpu caching operations (#1520 )	2 years ago
Jiarui Fang	af5438caa2	[FAW] refactor reorder() for CachedParamMgr (#1514 )	2 years ago
Jiarui Fang	9feee6d06b	[FAW] LFU initialize with dataset freq (#1513 )	2 years ago
CsRic	1b8fee8e9c	[FAW] shrink freq_cnter size (#1509 )	2 years ago
Jiarui Fang	ba61109b6c	[FAW] remove code related to chunk (#1501 )	2 years ago
Jiarui Fang	d5085bb317	[FAW] add more docs and fix a warning (#1500 )	2 years ago
CsRic	0ed2f46131	[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494 )	2 years ago
CsRic	b8d0e39eaf	[FAW] LFU cache for the FAW	2 years ago
Jiarui Fang	cde7b8a5b8	[FAW] init an LFU implementation for FAW (#1488 )	2 years ago
Geng Zhang	0aad53c62b	[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462 )	2 years ago
Jiarui Fang	a1476ea882	[NFC] polish doc style for ColoTensor (#1457 )	2 years ago
ver217	367c615818	fix nvme docstring (#1450 )	2 years ago
Geng Zhang	9f3eed66eb	[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448 )	2 years ago
Frank Lee	ae1b58cd16	[tensor] added linear implementation for the new sharding spec (#1416 ) * [tensor] added linear implementation for the new sharding spec * polish code	2 years ago
Jiarui Fang	30b4dd17c0	[FAW] export FAW in _ops (#1438 )	2 years ago
Jiarui Fang	c9427a323f	hotfix #1434 (#1437 )	2 years ago
Jiarui Fang	10b3df65c8	[FAW] move coloparam setting in test code. (#1429 )	2 years ago
Jiarui Fang	cb98cf5558	[FAW] parallel FreqAwareEmbedding (#1424 )	2 years ago
Jiarui Fang	d209aff684	Add FreqAwareEmbeddingBag (#1421 )	2 years ago
Jiarui Fang	504419d261	[FAW] add cache manager for the cached embedding (#1419 )	2 years ago
ver217	12b4887097	[hotfix] fix CPUAdam kernel nullptr (#1410 )	2 years ago
ver217	04c9a86af8	[zero] ZeroDDP supports controlling outputs' dtype (#1399 )	2 years ago
HELSON	4e98e938ce	[zero] alleviate memory usage in ZeRODDP state_dict (#1398 )	2 years ago
HELSON	c7221cb2d4	[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388 )	2 years ago
ver217	83328329dd	[hotfix] fix zero ddp buffer cast (#1376 ) * fix zero ddp buffer cast * fix zero ddp ignore params	2 years ago
ver217	5d5031e946	fix zero ddp state dict (#1378 )	2 years ago
ver217	c415240db6	[nvme] CPUAdam and HybridAdam support NVMe offload (#1360 ) * impl nvme optimizer * update cpu adam * add unit test * update hybrid adam * update docstr * add TODOs * update CI * fix CI * fix CI * fix CI path * fix CI path * fix CI path * fix install tensornvme * fix CI * fix CI path * fix CI env variables * test CI * test CI * fix CI * fix nvme optim __del__ * fix adam __del__ * fix nvme optim * fix CI env variables * fix nvme optim import * test CI * test CI * fix CI	2 years ago
HELSON	87775a0682	[colotensor] use cpu memory to store state_dict (#1367 )	2 years ago
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2 years ago
HELSON	7a8702c06d	[colotensor] add Tensor.view op and its unit test (#1343 ) [colotensor] add megatron initialization for gpt2	2 years ago
ver217	0c51ff2c13	[hotfix] ZeroDDP use new process group (#1333 ) * process group supports getting ranks in group * chunk mgr receives a process group * update unit test * fix unit tests	2 years ago
HELSON	1b41686461	[hotfix] fix unit test test_module_spec (#1321 )	2 years ago
Jiarui Fang	9e4c6449b0	[checkpoint] add ColoOptimizer checkpointing (#1316 )	2 years ago
Jiarui Fang	85f933b58b	[Optimizer] Remove useless ColoOptimizer (#1312 )	2 years ago
Jiarui Fang	9f10524313	[Optimizer] polish the init method of ColoOptimizer (#1310 )	2 years ago
HELSON	260a55804a	[hotfix] fix shape error in backward when using ColoTensor (#1298 )	2 years ago
runluo	f83c4d6597	[NFC] polish colossalai/nn/layer/wrapper/pipeline_wrapper.py code style (#1303 )	2 years ago
XYE	e83b2ce853	[NFC] polish colossalai/nn/layer/vanilla/layers.py code style (#1295 )	2 years ago
Liping233	1000a41fd5	[NFC] polish colossalai/nn/layer/vanilla/__init__.py code style (#1293 )	2 years ago
Wangbo Zhao(黑色枷锁)	552667825b	[NFC] polish colossalai/nn/layer/parallel_1d/layers.py code style (#1290 )	2 years ago
Jiatong Han	38e3ccd1e9	[NFC] polish colossalai/nn/layer/parallel_sequence/layers.py code style (#1280 ) Co-authored-by: JThh <jiatong.han@u.nus.edu>	2 years ago
Boyuan Yao	b414eaa5db	[NFC] polish colossalai/nn/optimizer/lamb.py code style (#1275 )	2 years ago
Super Daniel	52d145a342	[NFC] polish colossalai/nn/lr_scheduler/onecycle.py code style (#1269 )	2 years ago
Geng Zhang	0e06f62160	[NFC] polish colossalai/nn/layer/parallel_sequence/_operation.py code style (#1266 )	2 years ago
superhao1995	f660152c73	[NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style (#1258 ) Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>	2 years ago
Thunderbeee	9738fb0f78	[NFC] polish colossalai/nn/lr_scheduler/__init__.py (#1255 ) code style	2 years ago
Ofey Chan	2dd4d556fb	[NFC] polish colossalai/nn/init.py code style (#1292 )	2 years ago
HELSON	abba4d84e1	[hotfix] fix bert model test in unitests (#1272 )	2 years ago
oahzxl	0cf8e8e91c	[NFC] polish <colossalai/nn/lr_scheduler/poly.py> code style (#1267 )	2 years ago
Jiarui Fang	1aad903c15	[tensor] redistribute among different process groups (#1247 ) * make it faster * [tensor] rename convert_to_dist -> redistribute * [tensor] ShardSpec and ReplicaSpec * [tensor] redistribute among diff pgs * polish code	2 years ago
Jiarui Fang	9bcd2fd4af	[tensor] a shorter shard and replicate spec (#1245 )	2 years ago
Jiarui Fang	2699dfbbfd	[rename] convert_to_dist -> redistribute (#1243 )	2 years ago
Jiarui Fang	4a76084dc9	[tensor] add zero_like colo op, important for Optimizer (#1236 )	2 years ago
Jiarui Fang	3b500984b1	[tensor] fix some unittests (#1234 )	2 years ago
HELSON	0453776def	[tensor] fix a assertion in colo_tensor cross_entropy (#1232 )	2 years ago
HELSON	42ab36b762	[tensor] add unitest for colo_tensor 1DTP cross_entropy (#1230 )	2 years ago
Yi Zhao	04537bf83e	[checkpoint]support generalized scheduler (#1222 )	2 years ago
Jiarui Fang	a98319f023	[tensor] torch function return colotensor (#1229 )	2 years ago
Jiarui Fang	ae7d3f4927	[refactor] move process group from _DistSpec to ColoTensor. (#1203 )	2 years ago
Jiarui Fang	b5f25eb32a	[Tensor] add cpu group to ddp (#1200 )	2 years ago
Jiarui Fang	060b917daf	[refactor] remove gpc dependency in colotensor's _ops (#1189 )	2 years ago
Jiarui Fang	372f791444	[refactor] move chunk and chunkmgr to directory gemini (#1182 )	2 years ago
ver217	6b2f2ab9bb	[ddp] ColoDDP uses bucket all-reduce (#1177 ) * add reducer * update colo ddp with reducer * polish unit test * polish unit test	2 years ago
Jiarui Fang	1b657f9ce1	[tensor] revert local view back (#1178 )	2 years ago
Jiarui Fang	0dd4e2bbfb	[Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176 )	2 years ago
Ziyue Jiang	dd0420909f	[Tensor] rename parallel_action (#1174 ) * rename parallel_action * polish	2 years ago
Jiarui Fang	aa7bef73d4	[Tensor] distributed view supports inter-process hybrid parallel (#1169 )	2 years ago
Jiarui Fang	4b9bba8116	[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168 )	2 years ago
Jiarui Fang	f4ef224358	[Tensor] remove ParallelAction, use ComputeSpec instread (#1166 )	2 years ago
Jiarui Fang	177c374401	remove gather out in parallel action (#1163 )	2 years ago
Ziyue Jiang	955ac912de	remove log (#1160 )	2 years ago
Jiarui Fang	07f9c781f9	[graph] improve the graph building. (#1157 )	2 years ago
ver217	22717a856f	[tensor] add embedding bag op (#1156 )	2 years ago
ver217	ae86151968	[tensor] add more element-wise ops (#1155 ) * add more element-wise ops * update test_op * polish unit test	2 years ago
ver217	54aabb8da4	[gemini] refactor gemini mgr (#1151 ) * refactor gemini mgr * udpate __init__	2 years ago
ver217	8106d7b8c7	[ddp] refactor ColoDDP and ZeroDDP (#1146 ) * ColoDDP supports overwriting default process group * rename ColoDDPV2 to ZeroDDP * add docstr for ZeroDDP * polish docstr	2 years ago
ver217	ccf3c58c89	embedding op use gather_out (#1143 )	2 years ago
Frank Lee	15aab1476e	[zero] avoid zero hook spam by changing log to debug level (#1137 )	2 years ago
ver217	e4f555f29a	[optim] refactor fused sgd (#1134 )	2 years ago
ver217	d26902645e	[ddp] add save/load state dict for ColoDDP (#1127 ) * add save/load state dict for ColoDDP * add unit test * refactor unit test folder * polish unit test * rename unit test	2 years ago
ver217	f0a954f16d	[ddp] add set_params_to_ignore for ColoDDP (#1122 ) * add set_params_to_ignore for ColoDDP * polish code * fix zero hook v2 * add unit test * polish docstr	2 years ago
ver217	e127b4375b	cast colo ddp v2 inputs/outputs (#1120 )	2 years ago
ver217	7d14b473f0	[gemini] gemini mgr supports "cpu" placement policy (#1118 ) * update gemini mgr * update chunk * add docstr * polish placement policy * update test chunk * update test zero * polish unit test * remove useless unit test	2 years ago
ver217	895c1c5ee7	[tensor] refactor param op hook (#1097 ) * refactor param op hook * add docstr * fix bug	2 years ago
Frank Lee	cb18922c47	[doc] added documentation to chunk and chunk manager (#1094 ) * [doc] added documentation to chunk and chunk manager * polish code * polish code * polish code	3 years ago
ver217	1f894e033f	[gemini] zero supports gemini (#1093 ) * add placement policy * add gemini mgr * update mem stats collector * update zero * update zero optim * fix bugs * zero optim monitor os * polish unit test * polish unit test * add assert	3 years ago
Frank Lee	2b2dc1c86b	[pipeline] refactor the pipeline module (#1087 ) * [pipeline] refactor the pipeline module * polish code	3 years ago
ver217	be01db37c8	[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077 ) * polish chunk manager * polish unit test * impl add_extern_static_tensor for chunk mgr * add mem stats collector v2 * polish code * polish unit test * polish code * polish get chunks	3 years ago
Ziyue Jiang	0653c63eaa	[Tensor] 1d row embedding (#1075 ) * Add CPU 1d row embedding * polish	3 years ago
Ziyue Jiang	4fc748f69b	[Tensor] fix optimizer for CPU parallel (#1069 )	3 years ago
Jiarui Fang	49832b2344	[refactory] add nn.parallel module (#1068 )	3 years ago
Ziyue Jiang	6754f1b77f	fix module utils bug (#1066 )	3 years ago
Jiarui Fang	a00644079e	reorgnize colotensor directory (#1062 ) * reorgnize colotensor directory * polish code	3 years ago
Ziyue Jiang	df9dcbbff6	[Tensor] add hybrid device demo and fix bugs (#1059 )	3 years ago
ver217	51b9a49655	[zero] add zero optimizer for ColoTensor (#1046 ) * add zero optimizer * torch ok * unit test ok * polish code * fix bugs * polish unit test * polish zero optim * polish colo ddp v2 * refactor folder structure * add comment * polish unit test * polish zero optim * polish unit test	3 years ago
ver217	9492a561c3	[tensor] ColoTensor supports ZeRo (#1015 ) * impl chunk manager * impl param op hook * add reduce_chunk * add zero hook v2 * add zero dp * fix TensorInfo * impl load balancing when using zero without chunk * fix zero hook * polish chunk * fix bugs * ddp ok * zero ok * polish code * fix bugs about load balancing * polish code * polish code * add ene-to-end test * polish code * polish code * polish code * fix typo * add test_chunk * fix bugs * fix bugs * polish code	3 years ago
ver217	cefc29ff06	[tensor] impl ColoDDP for ColoTensor (#1009 ) * impl ColoDDP for ColoTensor * polish code	3 years ago
Ziheng Qin	571f12eff3	[NFC] polish colossalai/nn/layer/utils/common.py code style (#983 )	3 years ago
shenggan	18542b47fc	[NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976 )	3 years ago
Zirui Zhu	598cde4a0f	[NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972 )	3 years ago
LuGY	fb5bc6cb28	[NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966 )	3 years ago
ver217	58580b50fe	Revert "[NFC] Hotfix/format (#984 )" (#986 ) This reverts commit `0772828fba`.	3 years ago
binmakeswell	0772828fba	[NFC] Hotfix/format (#984 ) * [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#939) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#936) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style (#938) * [NFC] polish moe_cuda_kernel.cu code style (#940) Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> * [NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943) * [NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style (#942) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style (#945) * [NFC] polish colossalai/kernel/jit/bias_gelu.py code style (#946) Co-authored-by: jnbai <897086360@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949) Co-authored-by: Jiatong <jiatong.han@u.nus.edu> * [NFC] polish colossalai/builder/pipeline.py code style (#951) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp code style (#952) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cross_entropy.cu code style (#953) Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/softmax_kernels.cu code style (#954) * [NFC] polish colossalai/kernel/cuda_native/scaled_softmax.py code style (#955) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/context.h code style (#956) Co-authored-by: RichardoLuo <14049555596@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h code style (#957) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style (#958) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h code style (#962) * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style (#959) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu code style (#963) Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h code style (#964) * [NFC] polish __init__.py code style (#965) * [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h (#968) code style * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style (#970) * [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972) * [NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style (#973) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style (#974) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style (#977) * [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style (#978) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#979) * [NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980) * [NFC] polish colossalai/nn/layer/utils/common.py code style (#983) Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> Co-authored-by: yuxuan-lou <83441848+yuxuan-lou@users.noreply.github.com> Co-authored-by: Geng Zhang <34452939+zxgx@users.noreply.github.com> Co-authored-by: Maruyama_Aya <38985202+MaruyamaAya@users.noreply.github.com> Co-authored-by: XYE <92607131+Itok2000u@users.noreply.github.com> Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> Co-authored-by: HaoyuQin <79465534+coder-chin@users.noreply.github.com> Co-authored-by: wky <64853922+wangkuangyi@users.noreply.github.com> Co-authored-by: bajiaoyu517 <59548007+bajiaoyu517@users.noreply.github.com> Co-authored-by: luoling-LC <105470086+luoling-LC@users.noreply.github.com> Co-authored-by: jnbai <897086360@qq.com> Co-authored-by: JT.Han <59948448+JThh@users.noreply.github.com> Co-authored-by: Jiatong <jiatong.han@u.nus.edu> Co-authored-by: xyupeng <99191637+xyupeng@users.noreply.github.com> Co-authored-by: Sze-qq <68757353+Sze-qq@users.noreply.github.com> Co-authored-by: Cautiousss <48676630+Cautiousss@users.noreply.github.com> Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> Co-authored-by: Luxios22 <67457897+Luxios22@users.noreply.github.com> Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com> Co-authored-by: RichardoLuo <50363844+RichardoLuo@users.noreply.github.com> Co-authored-by: RichardoLuo <14049555596@qq.com> Co-authored-by: doubleHU <98150031+huxin711@users.noreply.github.com> Co-authored-by: runluo <68489000+run-qiao@users.noreply.github.com> Co-authored-by: MaxT <854721132@qq.com> Co-authored-by: superhao1995 <804673818@qq.com> Co-authored-by: ziyu huang <huang0ziyu@gmail.com> Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> Co-authored-by: Yuer867 <62204893+Yuer867@users.noreply.github.com> Co-authored-by: lucasliunju <lucasliunju@gmail.com> Co-authored-by: LuGY <74758262+Gy-Lu@users.noreply.github.com> Co-authored-by: ExtremeViscent <zhangyiqi55732@sina.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Zirui Zhu <zhuzr21@gmail.com> Co-authored-by: Ofey Chan <ofey206@gmail.com> Co-authored-by: DouJS <dujiangsu@163.com> Co-authored-by: Jie Zhu <chore.08-protist@icloud.com> Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: Kai Wang (Victor Kai) <37533040+kaiwang960112@users.noreply.github.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: Ziheng Qin <37519855+henryqin1997@users.noreply.github.com>	3 years ago
HELSON	e5ea3fdeef	[gemini] add GeminiMemoryManger (#832 ) * refactor StatefulTensor, tensor utilities * add unitest for GeminiMemoryManager	3 years ago
Ziyue Jiang	4b01da24cd	[TP] change the check assert in split batch 2d (#772 )	3 years ago
アマデウス	b8899e0905	[TP] allow layernorm without bias (#750 )	3 years ago
Frank Lee	eda30a058e	[compatibility] fixed tensor parallel compatibility with torch 1.9 (#700 )	3 years ago
HELSON	a9b8300d54	[zero] improve adaptability for not-shard parameters (#708 ) * adapt post grad hooks for not-shard parameters * adapt optimizer for not-shard parameters * offload gradients for not-replicated parameters	3 years ago
アマデウス	3fc8a204dc	[]Corrected 3d vocab parallel embedding (#707 )	3 years ago
HELSON	b31daed4cf	fix bugs in CPU adam (#633 ) * add cpu adam counter for all cpu adam * fixed updating error in adam kernel	3 years ago
Liang Bowen	828e465622	[hotfix] Raise messages for indivisible batch sizes with tensor parallelism (#622 )	3 years ago
アマデウス	77ad24bf94	[model checkpoint] updated saving/loading for 3d layers (#597 )	3 years ago
アマデウス	93089ed708	[model checkpoint] updated saving/loading for 2.5d layers (#596 )	3 years ago
アマデウス	c50bfb807b	[model checkpoint] updated saving/loading for 1d layers (#594 )	3 years ago
アマデウス	7636d518e1	[model checkpoint] updated saving/loading for 2d layers (#595 )	3 years ago
アマデウス	cd13b63832	[model checkpoint] reworked unified layers for ease of save/load states (#593 )	3 years ago
Ziyue Jiang	1c40ee8749	[TP] add assert for tp1d (#621 )	3 years ago

1 2 3 4 5 ...

306 Commits (10e3c9f923caf4fb68ab61e96c244bd5cca9b9da)