ColossalAI/examples/inference
yuehuayingxueluo 90cd5227a3
[Fix/Inference]Fix vllm benchmark (#5630)
* Fix bugs about OOM when running vllm-0.4.0

* rm used params

* change generation_config

* change benchmark log file name
2024-04-24 14:51:36 +08:00
..
benchmark_ops feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611) 2024-04-19 15:34:53 +08:00
benchmark_llama.py [Fix/Inference]Fix vllm benchmark (#5630) 2024-04-24 14:51:36 +08:00
benchmark_llama3.py [Fix/Inference]Fix vllm benchmark (#5630) 2024-04-24 14:51:36 +08:00
build_smoothquant_weight.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
llama_generation.py [example] Update Llama Inference example (#5629) 2024-04-23 22:23:07 +08:00
run_benchmark.sh [Fix/Inference]Fix vllm benchmark (#5630) 2024-04-24 14:51:36 +08:00
run_llama_inference.py [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00