You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ColossalAI/examples/inference
yuehuayingxueluo 934e31afb2
The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519)
8 months ago
..
benchmark_ops [Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 9 months ago
benchmark_llama.py [Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461) 8 months ago
build_smoothquant_weight.py [inference] refactor examples and fix schedule (#5077) 1 year ago
run_benchmark.sh The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519) 8 months ago
run_llama_inference.py [npu] change device to accelerator api (#5239) 11 months ago