mirror of https://github.com/hpcaitech/ColossalAI
![]() * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions |
||
---|---|---|
.. | ||
triton |