ColossalAI/examples/inference
Yuanheng Zhao 677cbfacf8
[Fix/Example] Fix Llama Inference Loading Data Type (#5763)
* [fix/example] fix llama inference loading dtype

* revise loading dtype of benchmark llama3
2024-05-30 13:48:46 +08:00
..
benchmark_ops add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
client [Inference]Fix readme and example for API server (#5742) 2024-05-24 10:03:05 +08:00
llama [Fix/Example] Fix Llama Inference Loading Data Type (#5763) 2024-05-30 13:48:46 +08:00