ColossalAI/colossalai/inference/modeling
Yuanheng Zhao 3da9993b0d
[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301)
* fix decoding kernel pytest

* revise and add triton context attn benchmark
2024-01-23 17:16:02 +08:00
..
layers [Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301) 2024-01-23 17:16:02 +08:00
models [inference] Adapted to Rotary Embedding and RMS Norm (#5283) 2024-01-22 10:55:34 +08:00
policy [inference] Adapted to Rotary Embedding and RMS Norm (#5283) 2024-01-22 10:55:34 +08:00