ColossalAI/examples/inference
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
* fix api server

* fix generation config

* fix api server

* fix comments

* fix infer hanging bug

* resolve comments, change backend to free port
2024-05-15 15:47:31 +08:00
..
benchmark_ops add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
client [Inference] Fix API server, test and example (#5712) 2024-05-15 15:47:31 +08:00
llama [Fix] Fix Inference Example, Tests, and Requirements (#5688) 2024-05-08 11:30:15 +08:00