ColossalAI/colossalai/inference/tensor_parallel
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046)
* update flash-context-attention

* adding kernels

* fix

* reset

* add build script

* add building process

* add llama2 exmaple

* add colossal-llama2 test

* clean

* fall back test setting

* fix test file

* clean

* clean

* clean

---------

Co-authored-by: cuiqing.li <lixx336@gmail.com>
2023-11-16 16:43:15 +08:00
..
modeling [Kernels]Update triton kernels into 2.1.0 (#5046) 2023-11-16 16:43:15 +08:00
policies [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
__init__.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
batch_infer_state.py [Pipeline inference] Combine kvcache with pipeline inference (#4938) 2023-10-27 16:19:54 +08:00
engine.py [hotfix] Suport extra_kwargs in ShardConfig (#5031) 2023-11-10 10:49:50 +08:00
kvcache_manager.py [Inference] Dynamic Batching Inference, online and offline (#4953) 2023-10-30 10:52:19 +08:00