ColossalAI

History

yuehuayingxueluo 4f28cb43c0 [inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py		2024-01-26 14:00:10 +08:00
..
test_models	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 )	2024-01-11 13:46:14 +00:00
_utils.py	[Inference] Add the logic of the inference engine (#5173 )	2024-01-11 13:39:56 +00:00
test_config_and_struct.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	2024-01-26 14:00:10 +08:00
test_inference_engine.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	2024-01-26 14:00:10 +08:00
test_kvcache_manager.py	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 )	2024-01-11 13:46:14 +00:00
test_request_handler.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	2024-01-26 14:00:10 +08:00