ColossalAI/colossalai/inference/engine/modeling
Cuiqing Li (李崔卿) bce919708f
[Kernels]added flash-decoidng of triton (#5063)
* added flash-decoidng of triton based on lightllm kernel

* add req

* clean

* clean

* delete build.sh

---------

Co-authored-by: cuiqing.li <lixx336@gmail.com>
2023-11-20 13:58:29 +08:00
..
__init__.py [inference] Refactor inference architecture (#5057) 2023-11-19 21:05:05 +08:00
_utils.py [inference] Refactor inference architecture (#5057) 2023-11-19 21:05:05 +08:00
bloom.py [inference] Refactor inference architecture (#5057) 2023-11-19 21:05:05 +08:00
chatglm2.py [inference] Refactor inference architecture (#5057) 2023-11-19 21:05:05 +08:00
llama.py [Kernels]added flash-decoidng of triton (#5063) 2023-11-20 13:58:29 +08:00