ColossalAI/tests/test_infer
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
* opt flash attn

* opt tmp tensor

* fix benchmark_llama

* fix code style

* fix None logic for output tensor

* fix adapted to get_xine_cache

* add comment

* fix ci bugs

* fix some codes

* rm duplicated codes

* rm duplicated codes

* fix code style

* add _get_dtype in config.py
2024-01-26 14:00:10 +08:00
..
test_models [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229) 2024-01-11 13:46:14 +00:00
_utils.py [Inference] Add the logic of the inference engine (#5173) 2024-01-11 13:39:56 +00:00
test_config_and_struct.py [inference]Optimize the usage of the mid tensors space in flash attn (#5304) 2024-01-26 14:00:10 +08:00
test_inference_engine.py [inference]Optimize the usage of the mid tensors space in flash attn (#5304) 2024-01-26 14:00:10 +08:00
test_kvcache_manager.py [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229) 2024-01-11 13:46:14 +00:00
test_request_handler.py [inference]Optimize the usage of the mid tensors space in flash attn (#5304) 2024-01-26 14:00:10 +08:00