ColossalAI/examples/inference
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046)
* update flash-context-attention

* adding kernels

* fix

* reset

* add build script

* add building process

* add llama2 exmaple

* add colossal-llama2 test

* clean

* fall back test setting

* fix test file

* clean

* clean

* clean

---------

Co-authored-by: cuiqing.li <lixx336@gmail.com>
2023-11-16 16:43:15 +08:00
..
serving [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
_utils.py [Inference]ADD Bench Chatglm2 script (#4963) 2023-10-24 13:11:15 +08:00
bench_bloom.py [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
bench_chatglm2.py [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
bench_llama.py [Kernels]Update triton kernels into 2.1.0 (#5046) 2023-11-16 16:43:15 +08:00
colossal_llama2_demo.py [Kernels]Update triton kernels into 2.1.0 (#5046) 2023-11-16 16:43:15 +08:00
gptq_bloom.py [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
gptq_llama.py [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
smoothquant_llama.py [inference] Add smmoothquant for llama (#4904) 2023-10-16 11:28:44 +08:00