yuehuayingxueluo
|
bfff9254ac
|
[inference] Adapted to Rotary Embedding and RMS Norm (#5283)
* adapted to rotary_embedding
* adapted to nopad rms norm
* fix bugs in benchmark
* fix flash_decoding.py
|
2024-01-22 10:55:34 +08:00 |
Yuanheng Zhao
|
0f2b46a41c
|
[kernel] Revise KVCache copy triton kernel API (#5273)
* [kernel/fix] revise kvcache copy kernel api
* fix benchmark
|
2024-01-16 14:41:02 +08:00 |
Yuanheng Zhao
|
fa85e02b3b
|
[kernel] Add KV cache copy kernel during decoding (#5261)
* add kv copy triton kernel during decoding stage
* add pytest and fix kernel
* fix test utilities
* revise kernel config
* add benchmark for kvcache copy
|
2024-01-15 17:37:20 +08:00 |