mirror of https://github.com/hpcaitech/ColossalAI
6e487e7d3c
* prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions |
||
---|---|---|
.. | ||
cuda_native | ||
jit | ||
triton | ||
__init__.py | ||
op_builder |