ColossalAI/colossalai/kernel/cuda_native/csrc
Xu Kai 946ab56c48
[feature] add gptq for inference (#4754)
* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import
2023-09-22 11:02:50 +08:00
..
gptq [feature] add gptq for inference (#4754) 2023-09-22 11:02:50 +08:00
kernels [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
colossal_C_frontend.cpp [optimizer] add div_scale for optimizers (#2117) 2022-12-12 17:58:57 +08:00
compat.h [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
cpu_adam.cpp [hotfix] fix CPUAdam kernel nullptr (#1410) 2022-08-05 19:45:45 +08:00
cpu_adam.h [hotfix] fix CPUAdam kernel nullptr (#1410) 2022-08-05 19:45:45 +08:00
layer_norm_cuda.cpp [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
layer_norm_cuda_kernel.cu [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
moe_cuda.cpp [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
moe_cuda_kernel.cu [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
multi_tensor_adam.cu [doc] add deepspeed citation and copyright (#2996) 2023-03-04 20:08:11 +08:00
multi_tensor_apply.cuh [doc] add deepspeed citation and copyright (#2996) 2023-03-04 20:08:11 +08:00
multi_tensor_l2norm_kernel.cu [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
multi_tensor_lamb.cu [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
multi_tensor_scale_kernel.cu [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
multi_tensor_sgd_kernel.cu [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
multihead_attention_1d.cpp [hotfix] fix error for torch 2.0 (#2243) 2022-12-30 23:11:55 +08:00
multihead_attention_1d.h [hotfix] fix error for torch 2.0 (#2243) 2022-12-30 23:11:55 +08:00
scaled_masked_softmax.cpp [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
scaled_masked_softmax.h [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
scaled_masked_softmax_cuda.cu [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949) 2022-05-17 10:25:06 +08:00
scaled_upper_triang_masked_softmax.cpp [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style (#959) 2022-05-17 10:25:06 +08:00
scaled_upper_triang_masked_softmax.h [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
scaled_upper_triang_masked_softmax_cuda.cu [NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943) 2022-05-17 10:25:06 +08:00
type_shim.h [bf16] add bf16 support (#3882) 2023-06-05 15:58:31 +08:00