ColossalAI/examples/inference
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
* Add Streaming LLM

* add some parameters to llama_generation.py

* verify streamingllm config

* add test_streamingllm.py

* modified according to the opinions of review

* add Citation

* change _block_tables tolist
2024-06-05 10:51:19 +08:00
..
benchmark_ops add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
client [Inference]Fix readme and example for API server (#5742) 2024-05-24 10:03:05 +08:00
llama [Inference]Add Streaming LLM (#5745) 2024-06-05 10:51:19 +08:00