ColossalAI/colossalai/kernel
Yuanheng Zhao 6e487e7d3c
[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274)
* prevent re-creating intermediate tensors

* add singleton class holding intermediate values

* fix triton kernel api

* add benchmark in pytest

* fix kernel api and add benchmark

* revise flash decoding triton kernel in/out shapes

* fix calling of triton kernel in modeling

* fix pytest: extract to util functions
2024-01-19 15:47:16 +08:00
..
cuda_native fix thrust-transform-reduce error (#5078) 2023-11-21 15:09:35 +08:00
jit [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
triton [kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274) 2024-01-19 15:47:16 +08:00
__init__.py [hotfix] Fix import error: colossal.kernel without triton installed (#4722) 2023-09-14 18:03:55 +08:00
op_builder [builder] reconfig op_builder for pypi install (#2314) 2023-01-04 16:32:32 +08:00