ColossalAI/examples/inference
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
* opt_view_and_memcopy

* fix bugs in ci

* fix ci bugs

* update benchmark scripts

* fix ci bugs
2024-02-21 13:23:57 +08:00
..
benchmark_llama.py Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390) 2024-02-21 13:23:57 +08:00
build_smoothquant_weight.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
run_benchmark.sh [Inference]Fused kv copy into rotary calculation (#5383) 2024-02-21 11:31:48 +08:00
run_llama_inference.py [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00