ColossalAI

Commit Graph

Author	SHA1	Message	Date
JT.Han	c3e423c8be	[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949 ) Co-authored-by: Jiatong <jiatong.han@u.nus.edu>	3 years ago
luoling-LC	72c71b67ec	[NFC] polish colossalai/kernel/jit/bias_gelu.py code style (#946 ) Co-authored-by: jnbai <897086360@qq.com>	3 years ago
bajiaoyu517	eb9a81d72a	[NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style (#945 )	3 years ago
wky	8ffdc38376	[NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style (#942 )	3 years ago
HaoyuQin	c0f373db5d	[NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943 )	3 years ago
XYE	5bbefeb06a	[NFC] polish moe_cuda_kernel.cu code style (#940 ) Co-authored-by: Xiao Ye <xiaoye2@illinois.edu>	3 years ago
Maruyama_Aya	7aa35eae6a	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style (#938 )	3 years ago
Geng Zhang	b6cc9313ef	[NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#936 )	3 years ago
yuxuan-lou	44b6f8947b	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#939 )	3 years ago
BoxiangW	872aa413c2	[NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937 )	3 years ago
ver217	58580b50fe	Revert "[NFC] Hotfix/format (#984 )" (#986 ) This reverts commit `0772828fba`.	3 years ago
binmakeswell	0772828fba	[NFC] Hotfix/format (#984 ) * [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#939) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#936) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style (#938) * [NFC] polish moe_cuda_kernel.cu code style (#940) Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> * [NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943) * [NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style (#942) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style (#945) * [NFC] polish colossalai/kernel/jit/bias_gelu.py code style (#946) Co-authored-by: jnbai <897086360@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949) Co-authored-by: Jiatong <jiatong.han@u.nus.edu> * [NFC] polish colossalai/builder/pipeline.py code style (#951) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp code style (#952) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cross_entropy.cu code style (#953) Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/softmax_kernels.cu code style (#954) * [NFC] polish colossalai/kernel/cuda_native/scaled_softmax.py code style (#955) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/context.h code style (#956) Co-authored-by: RichardoLuo <14049555596@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h code style (#957) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style (#958) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h code style (#962) * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style (#959) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu code style (#963) Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h code style (#964) * [NFC] polish __init__.py code style (#965) * [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h (#968) code style * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style (#970) * [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972) * [NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style (#973) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style (#974) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style (#977) * [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style (#978) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#979) * [NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980) * [NFC] polish colossalai/nn/layer/utils/common.py code style (#983) Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> Co-authored-by: yuxuan-lou <83441848+yuxuan-lou@users.noreply.github.com> Co-authored-by: Geng Zhang <34452939+zxgx@users.noreply.github.com> Co-authored-by: Maruyama_Aya <38985202+MaruyamaAya@users.noreply.github.com> Co-authored-by: XYE <92607131+Itok2000u@users.noreply.github.com> Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> Co-authored-by: HaoyuQin <79465534+coder-chin@users.noreply.github.com> Co-authored-by: wky <64853922+wangkuangyi@users.noreply.github.com> Co-authored-by: bajiaoyu517 <59548007+bajiaoyu517@users.noreply.github.com> Co-authored-by: luoling-LC <105470086+luoling-LC@users.noreply.github.com> Co-authored-by: jnbai <897086360@qq.com> Co-authored-by: JT.Han <59948448+JThh@users.noreply.github.com> Co-authored-by: Jiatong <jiatong.han@u.nus.edu> Co-authored-by: xyupeng <99191637+xyupeng@users.noreply.github.com> Co-authored-by: Sze-qq <68757353+Sze-qq@users.noreply.github.com> Co-authored-by: Cautiousss <48676630+Cautiousss@users.noreply.github.com> Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> Co-authored-by: Luxios22 <67457897+Luxios22@users.noreply.github.com> Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com> Co-authored-by: RichardoLuo <50363844+RichardoLuo@users.noreply.github.com> Co-authored-by: RichardoLuo <14049555596@qq.com> Co-authored-by: doubleHU <98150031+huxin711@users.noreply.github.com> Co-authored-by: runluo <68489000+run-qiao@users.noreply.github.com> Co-authored-by: MaxT <854721132@qq.com> Co-authored-by: superhao1995 <804673818@qq.com> Co-authored-by: ziyu huang <huang0ziyu@gmail.com> Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> Co-authored-by: Yuer867 <62204893+Yuer867@users.noreply.github.com> Co-authored-by: lucasliunju <lucasliunju@gmail.com> Co-authored-by: LuGY <74758262+Gy-Lu@users.noreply.github.com> Co-authored-by: ExtremeViscent <zhangyiqi55732@sina.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Zirui Zhu <zhuzr21@gmail.com> Co-authored-by: Ofey Chan <ofey206@gmail.com> Co-authored-by: DouJS <dujiangsu@163.com> Co-authored-by: Jie Zhu <chore.08-protist@icloud.com> Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: Kai Wang (Victor Kai) <37533040+kaiwang960112@users.noreply.github.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: Ziheng Qin <37519855+henryqin1997@users.noreply.github.com>	3 years ago
ver217	c2fdc6a011	[tensor] derive compute pattern from dist spec (#971 ) * derive compute pattern from dist spec * polish code	3 years ago
Ziyue Jiang	797a9dc5a9	add DistSpec for loss and test_model (#947 )	3 years ago
ver217	67c33f57eb	[tensor] design DistSpec and DistSpecManager for ColoTensor (#934 ) * add dist spec * update linear op * polish code * polish code * update embedding op * polish unit tests * polish unit tests * polish comments * polish code * add test_dist_spec_mgr * polish code * refactor folder structure * polish unit tests * add get_process_group() for TensorSpec * polish code	3 years ago
Ziyue Jiang	d73c2b1d79	[Tensor] fix init context (#931 ) * change torch.Parameter to ColoParameter * fix post assignment for init context * polish * polish	3 years ago
Ziyue Jiang	dfc88b85ea	[Tensor] simplify named param (#928 ) * simplify ColoModulize * simplify ColoModulize * polish * polish	3 years ago
YuliangLiu0306	32a45cd7ef	[pipelinable]use pipelinable to support GPT model. (#903 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [pipelinable]use pipelinable to support GPT model. * fix a bug caused by ShardedModel * polish * fix front func list	3 years ago
ver217	4ca732349e	[tensor] colo tensor overrides mul (#927 ) * colo tensor overrides mul * polish code	3 years ago
ver217	45b9124df4	[tensor] hijack addmm for colo tensor (#923 ) * hijack addmm for colo tensor * fix bugs * polish unit test * polish comments	3 years ago
Ziyue Jiang	c195d2814c	[Tensor] add from_pretrained support and bert pretrained test (#921 ) * add from_pretrained support and test * polish * polish * polish * polish	3 years ago
Jiarui Fang	845856ea29	[Graph] building computing graph with ColoTensor, Linear only (#917 )	3 years ago
Ziyue Jiang	75d221918a	[Tensor] add 1d vocab loss (#918 ) * add 1d vocab loss * polish	3 years ago
Jiarui Fang	ab95ec9aea	[Tensor] init ColoParameter (#914 )	3 years ago
Ziyue Jiang	f593a5637e	[Tensor] add embedding tp1d row (#904 )	3 years ago
Ziyue Jiang	2c0d19d755	[Tensor] add ColoTensor TP1Dcol Embedding (#899 )	3 years ago
Jiarui Fang	d16671da75	[Tensor] initialize the ColoOptimizer (#898 ) * [Tensor] activation is an attr of ColoTensor * [Tensor] add optimizer * only detach parameters in context * polish code	3 years ago
Jiarui Fang	676f191532	[Tensor] activation is an attr of ColoTensor (#897 )	3 years ago
Ziyue Jiang	cb182da7c5	[tensor] refine linear and add gather for laynorm (#893 ) * refine linear and add function to ColoTensor * add gather for layernorm * polish * polish	3 years ago
Jiarui Fang	26c49639d8	[Tensor] overriding paramters() for Module using ColoTensor (#889 )	3 years ago
Ziyue Jiang	1d0aba4153	[tensor] add ColoTensor 1Dcol (#888 )	3 years ago
Jiarui Fang	72cdc06875	[Tensor] make ColoTensor more robust for getattr (#886 ) * [Tensor] make ColoTensor more robust for getattr * polish * polish	3 years ago
Ziyue Jiang	9bc5a77c31	[tensor] wrap function in the torch_tensor to ColoTensor (#881 )	3 years ago
ver217	4df6471f5d	fix import error (#880 )	3 years ago
Jiarui Fang	7f76517a85	[Tensor] make a simple net works with 1D row TP (#879 )	3 years ago
ver217	c4d903e64a	[gemini] accelerate adjust_layout() (#878 ) * add lru cache * polish code * update unit test * fix sharded optim	3 years ago
Jiarui Fang	909211453b	[Tensor] Add some attributes to ColoTensor (#877 ) * [Tensor] add some function to ColoTensor * torch.allclose * rm torch.add	3 years ago
HELSON	425b4a96b8	[gemini] polish stateful_tensor_mgr (#876 )	3 years ago
Jiarui Fang	e43f83aa5c	[Tensor] get named parameters for model using ColoTensors (#874 )	3 years ago
Jiarui Fang	96211c2cc8	[tensor] customized op returns ColoTensor (#875 ) * [tensor] customized op returns ColoTensor * polish * polish code	3 years ago
Ziyue Jiang	26d4ab8b03	[Tensor] Add function to spec and update linear 1Drow and unit tests (#869 )	3 years ago
Frank Lee	11f54c7b6b	[doc] improved docstring and assertion messages for the engine module (#871 )	3 years ago
Frank Lee	1c34382678	[doc] improved assertion messages in trainer (#873 )	3 years ago
Frank Lee	7a64fae33a	[doc] improved error messages in initialize (#872 )	3 years ago
Jiarui Fang	1190b2c4a4	[tensor] add cross_entrophy_loss (#868 )	3 years ago
HELSON	3107817172	[gemini] add stateful tensor container (#867 )	3 years ago
Jiarui Fang	d01d3b8cb0	colo init context add device attr. (#866 )	3 years ago
Frank Lee	2238758c2e	[usability] improved error messages in the context module (#856 )	3 years ago
Frank Lee	9fdebadd69	[doc] improved docstring in the amp module (#857 )	3 years ago
Frank Lee	b862d89d00	[doc] improved docstring in the logging module (#861 )	3 years ago
Frank Lee	8004c8e938	[doc] improved docstring in the communication module (#863 )	3 years ago
Jiarui Fang	8af5f7423d	[tensor] an initial dea of tensor spec (#865 ) * a initial dea of tensor spec * polish * polish	3 years ago
Jiarui Fang	126ba573a8	[Tensor] add layer norm Op (#852 )	3 years ago
Frank Lee	a82da26f7e	[cli] refactored micro-benchmarking cli and added more metrics (#858 )	3 years ago
Frank Lee	ee222dfbf3	[usability] added assertion message in registry (#864 )	3 years ago
HELSON	f0e654558f	[gemini] polish code (#855 )	3 years ago
Jiarui Fang	29159d9b5b	hotfix tensor unittest bugs (#862 )	3 years ago
YuliangLiu0306	c6930d8ddf	[pipelinable]use ColoTensor to replace dummy tensor. (#853 )	3 years ago
Ziyue Jiang	bcc8655021	[Tensor ] Add 1Drow weight reshard by spec (#854 )	3 years ago
ver217	d7e0303d1e	[zero] use GeminiMemoryManager when sampling model data (#850 )	3 years ago
ver217	232142f402	[utils] refactor profiler (#837 ) * add model data profiler * add a subclass of torch.profiler.profile * refactor folder structure * remove redundant codes * polish code * use GeminiMemoryManager * fix import path * fix stm profiler ext * polish comments * remove useless file	3 years ago
Jiarui Fang	62f059251b	[Tensor] init a tp network training unittest (#849 )	3 years ago
ver217	0dea140760	[hotfix] add deconstructor for stateful tensor (#848 ) * add deconstructor for stateful tensor * fix colo init context	3 years ago
ver217	0f7ed8c192	fix _post_init_method of zero init ctx (#847 )	3 years ago
Ziyue Jiang	2a0a427e04	[tensor]add assert for colo_tensor 1Drow (#846 )	3 years ago
Ziyue Jiang	05023ecfee	[Tensor] TP Linear 1D row (#843 )	3 years ago
Frank Lee	cf6d1c9284	[CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844 ) * [cli] fixed multi-node job launching * [cli] fixed a bug in version comparison * [cli] support launching with env var * [cli] fixed multi-node job launching * [cli] fixed a bug in version comparison * [cli] support launching with env var * added docstring * [cli] added extra launch arguments * [cli] added default launch rdzv args * [cli] fixed version comparison * [cli] added docstring examples and requierment * polish docstring * polish code * polish code	3 years ago
HELSON	e5ea3fdeef	[gemini] add GeminiMemoryManger (#832 ) * refactor StatefulTensor, tensor utilities * add unitest for GeminiMemoryManager	3 years ago
YuliangLiu0306	35ea6e1023	[pipelinable]use pipelinable context to initialize non-pipeline model (#816 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [pipeline]add module lazy init feature to support large model initization. * [pipeline]add to_layer_list and partition method to support arbitrary non-pp model * refactor the module structure * polish * [pipelinable]add unit test for pipelinable * polish * polish * Fix CodeFactor issues.	3 years ago
Jiarui Fang	ea0a2ed25f	[hotfix] the bug of numel() in ColoTensor (#845 )	3 years ago
LuGY	c1e8d2001e	modefied the pp build for ckpt adaptation (#803 )	3 years ago
Jiarui Fang	8789850eea	Init Conext supports lazy allocate model memory (#842 )	3 years ago
Jiarui Fang	4575a3298b	[hotfix] ColoTensor pin_memory (#840 )	3 years ago
Frank Lee	01e9f834f5	[dependency] removed torchvision (#833 ) * [dependency] removed torchvision * fixed transforms	3 years ago
Jiarui Fang	cb5a4778e1	Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831 )" (#835 ) This reverts commit `ac88de6dfc`.	3 years ago
Jiarui Fang	ac88de6dfc	[WIP] Applying ColoTensor on TP-1D-row Linear. (#831 ) * revert zero tensors back * [tensor] init row 1d linear	3 years ago
Jiarui Fang	595bedf767	revert zero tensors back (#829 )	3 years ago
Jiarui Fang	294a6060d0	[tensor] ZeRO use ColoTensor as the base class. (#828 ) * [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. * [tensor] ZeRO use ColoTensor as the base class. * polish	3 years ago
Ziyue Jiang	8e6fdb4f29	[tensor]fix test_linear (#826 )	3 years ago
Ziyue Jiang	1a9e2c2dff	[tensor] fix kwargs in colo_tensor torch_funtion (#825 )	3 years ago
Jiarui Fang	eb1b89908c	[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. (#824 )	3 years ago
Jiarui Fang	2ecc3d7a55	[tensor] lazy init (#823 )	3 years ago
Jiarui Fang	68dcd51d41	[Tensor] update ColoTensor torch_function (#822 ) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit `88759e289e`. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * [tensor] renaming and reorganize directory structure. * rm useless dir * polish * polish * [tensor] hander the function not wrapped * polish	3 years ago
Jiarui Fang	0ce8924ceb	[tensor] reorganize files (#820 )	3 years ago
Jiarui Fang	ab962b9735	[gemini] a new tensor structure (#818 ) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit `88759e289e`. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish * polish code * add a new tensor structure and override linear for it * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	3 years ago
FrankLeeeee	70ed11d07e	[cli] added check installation cli	3 years ago
YuliangLiu0306	c7eca40f51	Merge pull request #812 from FrankLeeeee/feature/cli [cli] fixed single-node process launching	3 years ago
Jiarui Fang	3ddbd1bce1	[gemini] collect cpu-gpu moving volume in each iteration (#813 )	3 years ago
FrankLeeeee	d522cb704e	[cli] fixed single-node process launching	3 years ago
Jiarui Fang	61c20b44bc	[log] local throughput metrics (#811 ) * Revert "[zero] add ZeroTensorShardStrategy (#793)" This reverts commit `88759e289e`. * [gemini] set cpu memory capacity * [log] local throughput collecting * polish * polish * polish * polish code * polish	3 years ago
ver217	dd92b90a68	[DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext (#808 ) * init fp16 param directly * polish code	3 years ago
Jiarui Fang	227d1cd4b3	[gemini] APIs to set cpu memory capacity (#809 )	3 years ago
FrankLeeeee	f63e91d280	[cli] fixed a bug in user args and refactored the module structure	3 years ago
Jiarui Fang	e761ad2cd7	Revert "[zero] add ZeroTensorShardStrategy (#793 )" (#806 )	3 years ago
HELSON	88759e289e	[zero] add ZeroTensorShardStrategy (#793 )	3 years ago
Jiarui Fang	681addb512	[refactor] moving grad acc logic to engine (#804 )	3 years ago
Frank Lee	05d9ae5999	[cli] add missing requirement (#805 )	3 years ago
YuliangLiu0306	de2f581d43	[cli] added micro benchmarking for tp (#789 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [CLI]add cli benchmark feature * fix CodeFactor issues. * refactor the module structure.	3 years ago
YuliangLiu0306	cfadc9df8e	[cli] added distributed launcher command (#791 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [CLI]add cli launcher feature * remove testing message used during developing * refactor the module structure.	3 years ago
Jiarui Fang	4d9332b4c5	[refactor] moving memtracer to gemini (#801 )	3 years ago

1 2 3 4 5 ...

455 Commits (3d10be33bdd89d4b93d95486dcf8d386ad9ae5e6)