ColossalAI

Commit Graph

Author	SHA1	Message	Date
Yuanheng Zhao	3da9993b0d	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 ) * fix decoding kernel pytest * revise and add triton context attn benchmark	2024-01-23 17:16:02 +08:00
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	2024-01-18 16:31:14 +08:00
yuehuayingxueluo	86b63f720c	[Inference]Adapted to the triton attn kernels (#5264 ) * adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print	2024-01-17 16:03:10 +08:00
Yuanheng Zhao	fa85e02b3b	[kernel] Add KV cache copy kernel during decoding (#5261 ) * add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy	2024-01-15 17:37:20 +08:00
FrankLeeeee	1ded7e81ef	[git] fixed rebased files	2024-01-11 13:50:45 +00:00
yuehuayingxueluo	fab294c7f4	fix CI bugs	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	2a73e828eb	fix bugs related to processing padding mask	2024-01-11 13:46:14 +00:00
Jianghai	e545a871b8	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 ) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	47e53eaa1c	fix bugs in attention.py and request_handler.py	2024-01-11 13:44:06 +00:00
Jianghai	bfd9b1b494	[Inference] Pytorch Attention func, pad&nopad input support (#5219 ) * add attn * add attention test * fix attn forward * fix decoding	2024-01-11 13:44:06 +00:00

10 Commits (3da9993b0d03923755c1fcd6279cc4c7b8d00d1e)