Jun Gao
dce05da535
fix thrust-transform-reduce error ( #5078 )
2023-11-21 15:09:35 +08:00
Hongxin Liu
e5ce4c8ea6
[npu] add npu support for gemini and zero ( #5067 )
...
* [npu] setup device utils (#5047 )
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052 )
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065 )
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
2023-11-20 16:12:41 +08:00
Hongxin Liu
4f68b3f10c
[kernel] support pure fp16 for cpu adam and update gemini optim tests ( #4921 )
...
* [kernel] support pure fp16 for cpu adam (#4896 )
* [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919 )
* [kernel] fix cpu adam
* [test] update gemini optim test
2023-10-16 21:56:53 +08:00
Xu Kai
611a5a80ca
[inference] Add smmoothquant for llama ( #4904 )
...
* [inference] add int8 rotary embedding kernel for smoothquant (#4843 )
* [inference] add smoothquant llama attention (#4850 )
* add smoothquant llama attention
* remove uselss code
* remove useless code
* fix import error
* rename file name
* [inference] add silu linear fusion for smoothquant llama mlp (#4853 )
* add silu linear
* update skip condition
* catch smoothquant cuda lib exception
* prcocess exception for tests
* [inference] add llama mlp for smoothquant (#4854 )
* add llama mlp for smoothquant
* fix down out scale
* remove duplicate lines
* add llama mlp check
* delete useless code
* [inference] add smoothquant llama (#4861 )
* add smoothquant llama
* fix attention accuracy
* fix accuracy
* add kv cache and save pretrained
* refactor example
* delete smooth
* refactor code
* [inference] add smooth function and delete useless code for smoothquant (#4895 )
* add smooth function and delete useless code
* update datasets
* remove duplicate import
* delete useless file
* refactor codes (#4902 )
* rafactor code
* add license
* add torch-int and smoothquant license
2023-10-16 11:28:44 +08:00
Camille Zhong
cd6a962e66
[NFC] polish code style ( #4799 )
2023-10-07 13:36:52 +08:00
littsk
eef96e0877
polish code for gptq ( #4793 )
2023-10-07 13:36:52 +08:00
Xu Kai
946ab56c48
[feature] add gptq for inference ( #4754 )
...
* [gptq] add gptq kernel (#4416 )
* add gptq
* refactor code
* fix tests
* replace auto-gptq
* rname inferance/quant
* refactor test
* add auto-gptq as an option
* reset requirements
* change assert and check auto-gptq
* add import warnings
* change test flash attn version
* remove example
* change requirements of flash_attn
* modify tests
* [skip ci] change requirements-test
* [gptq] faster gptq cuda kernel (#4494 )
* [skip ci] add cuda kernels
* add license
* [skip ci] fix max_input_len
* format files & change test size
* [skip ci]
* [gptq] add gptq tensor parallel (#4538 )
* add gptq tensor parallel
* add gptq tp
* delete print
* add test gptq check
* add test auto gptq check
* [gptq] combine gptq and kv cache manager (#4706 )
* combine gptq and kv cache manager
* add init bits
* delete useless code
* add model path
* delete usless print and update test
* delete usless import
* move option gptq to shard config
* change replace linear to shardformer
* update bloom policy
* delete useless code
* fix import bug and delete uselss code
* change colossalai/gptq to colossalai/quant/gptq
* update import linear for tests
* delete useless code and mv gptq_kernel to kernel directory
* fix triton kernel
* add triton import
2023-09-22 11:02:50 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Hongxin Liu
ae02d4e4f7
[bf16] add bf16 support ( #3882 )
...
* [bf16] add bf16 support for fused adam (#3844 )
* [bf16] fused adam kernel support bf16
* [test] update fused adam kernel test
* [test] update fused adam test
* [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860 )
* [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869 )
* [bf16] add mixed precision mixin
* [bf16] low level zero optim support bf16
* [text] update low level zero test
* [text] fix low level zero grad acc test
* [bf16] add bf16 support for gemini (#3872 )
* [bf16] gemini support bf16
* [test] update gemini bf16 test
* [doc] update gemini docstring
* [bf16] add bf16 support for plugins (#3877 )
* [bf16] add bf16 support for legacy zero (#3879 )
* [zero] init context support bf16
* [zero] legacy zero support bf16
* [test] add zero bf16 test
* [doc] add bf16 related docstring for legacy zero
2023-06-05 15:58:31 +08:00
ver217
823f3b9cf4
[doc] add deepspeed citation and copyright ( #2996 )
...
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
2023-03-04 20:08:11 +08:00
ver217
090f14fd6b
[misc] add reference ( #2930 )
...
* [misc] add reference
* [misc] add license
2023-02-28 18:07:24 +08:00
xcnick
85178a397a
[hotfix] fix error for torch 2.0 ( #2243 )
2022-12-30 23:11:55 +08:00
HELSON
e7d3afc9cc
[optimizer] add div_scale for optimizers ( #2117 )
...
* [optimizer] add div_scale for optimizers
* [zero] use div_scale in zero optimizer
* fix testing error
2022-12-12 17:58:57 +08:00
xcnick
e0da01ea71
[hotfix] fix build error when torch version >= 1.13 ( #1803 )
2022-11-08 09:40:24 +08:00
ver217
12b4887097
[hotfix] fix CPUAdam kernel nullptr ( #1410 )
2022-08-05 19:45:45 +08:00
binmakeswell
7696cead8d
Recover kernal files
2022-07-13 12:08:21 +08:00
Maruyama_Aya
87f679aeae
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/kernels.h code style ( #1291 )
2022-07-13 12:08:21 +08:00
doubleHU
d6f5ef8860
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/transform_kernels.cu code style ( #1286 )
2022-07-13 12:08:21 +08:00
yuxuan-lou
5f6ab35d25
Hotfix/format ( #1274 )
...
* [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937 )
* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style
* [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.cpp code style
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2022-07-13 12:08:21 +08:00
binmakeswell
c95e18cdb9
[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.h code style ( #1270 )
2022-07-13 12:08:21 +08:00
DouJS
db13f96333
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_apply.cuh code style ( #1264 )
2022-07-13 12:08:21 +08:00
shenggan
5d7366b144
[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.h code style ( #1263 )
2022-07-13 12:08:21 +08:00
ziyu huang
f1cafcc73a
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style ( #1261 )
...
Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>
2022-07-13 12:08:21 +08:00
Sze-qq
f8b9aaef47
[NFC] polish colossalai/kernel/cuda_native/csrc/type_shim.h code style ( #1260 )
2022-07-13 12:08:21 +08:00
ver217
e4f555f29a
[optim] refactor fused sgd ( #1134 )
2022-06-20 11:19:38 +08:00
zhengzangw
ae7c338105
[NFC] polish colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp code style
2022-05-20 23:57:38 +08:00
Frank Lee
533d0c46d8
[kernel] fixed the include bug in dropout kernel ( #999 )
2022-05-18 21:43:18 +08:00
Kai Wang (Victor Kai)
c50c08dcbb
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style ( #979 )
2022-05-17 10:25:06 +08:00
binmakeswell
f28c021376
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style ( #978 )
2022-05-17 10:25:06 +08:00
Jie Zhu
b67eebd20f
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style ( #977 )
2022-05-17 10:25:06 +08:00
DouJS
52705ec5c5
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style ( #974 )
2022-05-17 10:25:06 +08:00
Ofey Chan
136946422b
[NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style ( #973 )
2022-05-17 10:25:06 +08:00
Xu Kai
632e94abde
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style ( #970 )
2022-05-17 10:25:06 +08:00
ExtremeViscent
22d1df224d
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h ( #968 )
...
code style
2022-05-17 10:25:06 +08:00
Yuer867
7106a399fc
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h code style ( #964 )
2022-05-17 10:25:06 +08:00
ziyu huang
5bd80b7dd1
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu code style ( #963 )
...
Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>
2022-05-17 10:25:06 +08:00
superhao1995
48c4a180c7
[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style ( #959 )
2022-05-17 10:25:06 +08:00
MaxT
442a2975ab
[NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h code style ( #962 )
2022-05-17 10:25:06 +08:00
runluo
89e2767a92
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style ( #958 )
2022-05-17 10:25:06 +08:00
doubleHU
1dc1b6fa00
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h code style ( #957 )
2022-05-17 10:25:06 +08:00
RichardoLuo
0e922da874
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/context.h code style ( #956 )
...
Co-authored-by: RichardoLuo <14049555596@qq.com>
2022-05-17 10:25:06 +08:00
Luxios22
f6970ef8b1
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/softmax_kernels.cu code style ( #954 )
2022-05-17 10:25:06 +08:00
Cautiousss
0b86a6345e
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cross_entropy.cu code style ( #953 )
...
Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local>
2022-05-17 10:25:06 +08:00
Sze-qq
d8d07b0e2b
[NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp code style ( #952 )
2022-05-17 10:25:06 +08:00
JT.Han
c3e423c8be
[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style ( #949 )
...
Co-authored-by: Jiatong <jiatong.han@u.nus.edu>
2022-05-17 10:25:06 +08:00
bajiaoyu517
eb9a81d72a
[NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style ( #945 )
2022-05-17 10:25:06 +08:00
wky
8ffdc38376
[NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style ( #942 )
2022-05-17 10:25:06 +08:00
HaoyuQin
c0f373db5d
[NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style ( #943 )
2022-05-17 10:25:06 +08:00
XYE
5bbefeb06a
[NFC] polish moe_cuda_kernel.cu code style ( #940 )
...
Co-authored-by: Xiao Ye <xiaoye2@illinois.edu>
2022-05-17 10:25:06 +08:00
Maruyama_Aya
7aa35eae6a
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style ( #938 )
2022-05-17 10:25:06 +08:00