zbian
|
7bc0afc901
|
updated flash attention usage
|
2 years ago |
Frank Lee
|
95a36eae63
|
[kernel] added kernel loader to softmax autograd function (#3093)
* [kernel] added kernel loader to softmax autograd function
* [release] v0.2.6
|
2 years ago |
ver217
|
823f3b9cf4
|
[doc] add deepspeed citation and copyright (#2996)
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
* [doc] add deepspeed citation and copyright
|
2 years ago |
ver217
|
090f14fd6b
|
[misc] add reference (#2930)
* [misc] add reference
* [misc] add license
|
2 years ago |
Frank Lee
|
918bc94b6b
|
[triton] added copyright information for flash attention (#2835)
* [triton] added copyright information for flash attention
* polish code
|
2 years ago |
Frank Lee
|
dd14783f75
|
[kernel] fixed repeated loading of kernels (#2549)
* [kernel] fixed repeated loading of kernels
* polish code
* polish code
|
2 years ago |
Frank Lee
|
8b7495dd54
|
[example] integrate seq-parallel tutorial with CI (#2463)
|
2 years ago |
jiaruifang
|
69d9180c4b
|
[hotfix] issue #2388
|
2 years ago |
Frank Lee
|
40d376c566
|
[setup] support pre-build and jit-build of cuda kernels (#2374)
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
|
2 years ago |
Jiarui Fang
|
db6eea3583
|
[builder] reconfig op_builder for pypi install (#2314)
|
2 years ago |
Jiarui Fang
|
16cc8e6aa7
|
[builder] MOE builder (#2277)
|
2 years ago |
xcnick
|
85178a397a
|
[hotfix] fix error for torch 2.0 (#2243)
|
2 years ago |
Jiarui Fang
|
db4cbdc7fb
|
[builder] builder for scaled_upper_triang_masked_softmax (#2234)
|
2 years ago |
Jiarui Fang
|
54de05da5d
|
[builder] polish builder with better base class (#2216)
* [builder] polish builder
* remove print
|
2 years ago |
Jiarui Fang
|
7675792100
|
[builder] raise Error when CUDA_HOME is not set (#2213)
|
2 years ago |
Jiarui Fang
|
1cb532ffec
|
[builder] multihead attn runtime building (#2203)
* [hotfix] correcnt cpu_optim runtime compilation
* [builder] multihead attn
* fix bug
* fix a bug
|
2 years ago |
Jiarui Fang
|
5682e6d346
|
[hotfix] correcnt cpu_optim runtime compilation (#2197)
|
2 years ago |
Jiarui Fang
|
355ffb386e
|
[builder] unified cpu_optim fused_optim inferface (#2190)
|
2 years ago |
Jiarui Fang
|
bc0e271e71
|
[buider] use builder() for cpu adam and fused optim in setup.py (#2187)
|
2 years ago |
Jiarui Fang
|
d42afd30f8
|
[builder] runtime adam and fused_optim builder (#2184)
|
2 years ago |
アマデウス
|
077a66dd81
|
updated attention kernel (#2133)
|
2 years ago |
HELSON
|
e7d3afc9cc
|
[optimizer] add div_scale for optimizers (#2117)
* [optimizer] add div_scale for optimizers
* [zero] use div_scale in zero optimizer
* fix testing error
|
2 years ago |
ver217
|
f8a7148dec
|
[kernel] move all symlinks of kernel to `colossalai._C` (#1971)
|
2 years ago |
zbian
|
6877121377
|
updated flash attention api
|
2 years ago |
アマデウス
|
4268ae017b
|
[kernel] added jit warmup (#1792)
|
2 years ago |
xcnick
|
e0da01ea71
|
[hotfix] fix build error when torch version >= 1.13 (#1803)
|
2 years ago |
oahzxl
|
9639ea88fc
|
[kernel] more flexible flashatt interface (#1804)
|
2 years ago |
oahzxl
|
501a9e9cd2
|
[hotfix] polish flash attention (#1802)
|
2 years ago |
Jiarui Fang
|
c248800359
|
[kernel] skip tests of flash_attn and triton when they are not available (#1798)
|
2 years ago |
oahzxl
|
25952b67d7
|
[feat] add flash attention (#1762)
|
2 years ago |
ver217
|
12b4887097
|
[hotfix] fix CPUAdam kernel nullptr (#1410)
|
2 years ago |
binmakeswell
|
7696cead8d
|
Recover kernal files
|
2 years ago |
Maruyama_Aya
|
87f679aeae
|
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/kernels.h code style (#1291)
|
2 years ago |
doubleHU
|
d6f5ef8860
|
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/transform_kernels.cu code style (#1286)
|
2 years ago |
yuxuan-lou
|
5f6ab35d25
|
Hotfix/format (#1274)
* [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937)
* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style
* [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.cpp code style
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
|
2 years ago |
binmakeswell
|
c95e18cdb9
|
[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.h code style (#1270)
|
2 years ago |
DouJS
|
db13f96333
|
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_apply.cuh code style (#1264)
|
2 years ago |
shenggan
|
5d7366b144
|
[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.h code style (#1263)
|
2 years ago |
ziyu huang
|
f1cafcc73a
|
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#1261)
Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>
|
2 years ago |
Sze-qq
|
f8b9aaef47
|
[NFC] polish colossalai/kernel/cuda_native/csrc/type_shim.h code style (#1260)
|
2 years ago |
ver217
|
e4f555f29a
|
[optim] refactor fused sgd (#1134)
|
2 years ago |
zhengzangw
|
ae7c338105
|
[NFC] polish colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp code style
|
3 years ago |
Frank Lee
|
533d0c46d8
|
[kernel] fixed the include bug in dropout kernel (#999)
|
3 years ago |
puck_WCR
|
bda70b4b66
|
[NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980)
|
3 years ago |
Kai Wang (Victor Kai)
|
c50c08dcbb
|
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#979)
|
3 years ago |
binmakeswell
|
f28c021376
|
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style (#978)
|
3 years ago |
Jie Zhu
|
b67eebd20f
|
[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style (#977)
|
3 years ago |
DouJS
|
52705ec5c5
|
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style (#974)
|
3 years ago |
Ofey Chan
|
136946422b
|
[NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style (#973)
|
3 years ago |
Xu Kai
|
632e94abde
|
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style (#970)
|
3 years ago |