ColossalAI/colossalai/inference/tensor_parallel/modeling
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046)
* update flash-context-attention

* adding kernels

* fix

* reset

* add build script

* add building process

* add llama2 exmaple

* add colossal-llama2 test

* clean

* fall back test setting

* fix test file

* clean

* clean

* clean

---------

Co-authored-by: cuiqing.li <lixx336@gmail.com>
2023-11-16 16:43:15 +08:00
..
__init__.py [inference]fix import bug and delete down useless init (#4830) 2023-10-04 09:18:45 +08:00
_utils.py [inference] add llama2 support (#4898) 2023-10-13 13:09:23 +08:00
bloom.py [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) 2023-10-30 14:04:37 +08:00
chatglm2.py [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) 2023-11-07 15:01:50 +08:00
llama.py [Kernels]Update triton kernels into 2.1.0 (#5046) 2023-11-16 16:43:15 +08:00