ColossalAI

History

Cuiqing Li 459a88c806 [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 ) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>		2023-10-30 14:04:37 +08:00
..
serving	[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841 )	2023-10-02 17:48:38 +08:00
_utils.py	[Inference]ADD Bench Chatglm2 script (#4963 )	2023-10-24 13:11:15 +08:00
bench_bloom.py	[Inference]ADD Bench Chatglm2 script (#4963 )	2023-10-24 13:11:15 +08:00
bench_chatglm2.py	[Inference]ADD Bench Chatglm2 script (#4963 )	2023-10-24 13:11:15 +08:00
bench_llama.py	[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 )	2023-10-30 14:04:37 +08:00
gptq_bloom.py	[Inference]ADD Bench Chatglm2 script (#4963 )	2023-10-24 13:11:15 +08:00
gptq_llama.py	[Inference]ADD Bench Chatglm2 script (#4963 )	2023-10-24 13:11:15 +08:00
smoothquant_llama.py	[inference] Add smmoothquant for llama (#4904 )	2023-10-16 11:28:44 +08:00