ColossalAI/colossalai/kernel
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
* opt flash attn

* opt tmp tensor

* fix benchmark_llama

* fix code style

* fix None logic for output tensor

* fix adapted to get_xine_cache

* add comment

* fix ci bugs

* fix some codes

* rm duplicated codes

* rm duplicated codes

* fix code style

* add _get_dtype in config.py
2024-01-26 14:00:10 +08:00
..
cuda_native fix thrust-transform-reduce error (#5078) 2023-11-21 15:09:35 +08:00
jit [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
triton [inference]Optimize the usage of the mid tensors space in flash attn (#5304) 2024-01-26 14:00:10 +08:00
__init__.py [hotfix] Fix import error: colossal.kernel without triton installed (#4722) 2023-09-14 18:03:55 +08:00
op_builder [builder] reconfig op_builder for pypi install (#2314) 2023-01-04 16:32:32 +08:00