ColossalAI

History

Jianghai e0757c31fb [inference] Dynamic Batching for Single and Multiple GPUs (#4831 ) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>		2023-10-11 17:52:52 +08:00
..
__init__.py	[inference] Dynamic Batching for Single and Multiple GPUs (#4831 )	2023-10-11 17:52:52 +08:00
context_attention.py	[inference] chatglm2 infer demo (#4724 )	2023-09-22 11:12:50 +08:00
copy_kv_cache_dest.py	[inference] Dynamic Batching for Single and Multiple GPUs (#4831 )	2023-10-11 17:52:52 +08:00
custom_autotune.py	add autotune (#4822 )	2023-09-28 13:47:35 +08:00
fused_layernorm.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
gptq_triton.py	add autotune (#4822 )	2023-09-28 13:47:35 +08:00
qkv_matmul_kernel.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
rms_norm.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
rotary_embedding_kernel.py	[inference] chatglm2 infer demo (#4724 )	2023-09-22 11:12:50 +08:00
self_attention_nofusion.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
softmax.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
token_attention_kernel.py	[inference] chatglm2 infer demo (#4724 )	2023-09-22 11:12:50 +08:00