ColossalAI

History

Cuiqing Li 459a88c806 [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 ) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	2023-10-30 14:04:37 +08:00
..
triton	[Refactor] Integrated some lightllm kernels into token-attention (#4946 )	2023-10-19 22:22:47 +08:00

[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 )

* adding flash-decoding

* clean

* adding kernel

* adding flash-decoding

* add integration

* add

* adding kernel

* adding kernel

* adding triton 2.1.0 features for inference

* update bloom triton kernel

* remove useless vllm kernels

* clean codes

* fix

* adding files

* fix readme

* update llama flash-decoding

---------

Co-authored-by: cuiqing.li <lixx336@gmail.com>

2023-10-30 14:04:37 +08:00

triton

[Refactor] Integrated some lightllm kernels into token-attention (#4946 )

2023-10-19 22:22:47 +08:00