ColossalAI/examples/inference
yuehuayingxueluo bfff9254ac
[inference] Adapted to Rotary Embedding and RMS Norm (#5283)
* adapted to rotary_embedding

* adapted to nopad rms norm

* fix bugs in benchmark

* fix flash_decoding.py
2024-01-22 10:55:34 +08:00
..
benchmark_llama.py [inference] Adapted to Rotary Embedding and RMS Norm (#5283) 2024-01-22 10:55:34 +08:00
build_smoothquant_weight.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
run_benchmark.sh [Inference]Adapted to the triton attn kernels (#5264) 2024-01-17 16:03:10 +08:00
run_llama_inference.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00