ColossalAI/examples/inference
yuehuayingxueluo 934e31afb2
The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519)
2024-03-28 10:42:51 +08:00
..
benchmark_ops [Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 2024-03-13 17:20:03 +08:00
benchmark_llama.py [Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461) 2024-03-25 13:40:34 +08:00
build_smoothquant_weight.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
run_benchmark.sh The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519) 2024-03-28 10:42:51 +08:00
run_llama_inference.py [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00