ColossalAI

History

Steve Luo be396ad6cc [Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 ) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2024-04-18 16:45:07 +08:00
..
benchmark_ops	[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 )	2024-04-18 16:45:07 +08:00
benchmark_llama.py	[inference/model]Adapted to the baichuan2-7B model (#5591 )	2024-04-15 16:53:02 +08:00
build_smoothquant_weight.py	[inference] refactor examples and fix schedule (#5077 )	2023-11-21 10:46:03 +08:00
run_benchmark.sh	The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519 )	2024-03-28 10:42:51 +08:00
run_llama_inference.py	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00