Yuanheng Zhao
|
d63c469f45
|
[Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)
fix dependency in pytest
* resolve conflicts for revising flash-attn
* adapt kv cache copy kernel for spec-dec
* fix seqlen-n kvcache copy kernel/tests
* test kvcache copy - use torch.equal
* add assertions
* (trivial) comment out
|
2024-04-10 11:07:51 +08:00 |
Jianghai
|
730103819d
|
[Inference]Fused kv copy into rotary calculation (#5383)
* revise rotary embedding
* remove useless print
* adapt
* fix
* add
* fix
* modeling
* fix
* fix
* fix
* fused kv copy
* fused copy
* colossalai/kernel/triton/no_pad_rotary_embedding.py
* del padding llama
* del
|
2024-02-21 11:31:48 +08:00 |
yuehuayingxueluo
|
6fb4bcbb24
|
[Inference/opt] Fused KVCahce Memcopy (#5374)
* fused kv memcopy
* add TODO in test_kvcache_copy.py
|
2024-02-07 17:15:42 +08:00 |
Frank Lee
|
8106ede07f
|
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373)
This reverts commit 9f4ab2eb92 .
|
2024-02-07 14:27:04 +08:00 |
Jianghai
|
9f4ab2eb92
|
[Inference] Adapt to Fused rotary (#5348)
* revise rotary embedding
* remove useless print
* adapt
* fix
* add
* fix
* modeling
* fix
* fix
* fix
|
2024-02-07 11:36:04 +08:00 |
Yuanheng Zhao
|
5f98a9d68a
|
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325)
* revise shape of kvcache (context attn kernel)
* revise shape of kvcache (flash decoding kernel)
* revise shape of kvcache (kvcache copy) and attn func
* init of kvcache in kvcache manager
* revise llama modeling
* revise block size retrieval
* use torch for rms_norm benchmarking
* revise block size retrieval
|
2024-01-30 16:06:09 +08:00 |
yuehuayingxueluo
|
bfff9254ac
|
[inference] Adapted to Rotary Embedding and RMS Norm (#5283)
* adapted to rotary_embedding
* adapted to nopad rms norm
* fix bugs in benchmark
* fix flash_decoding.py
|
2024-01-22 10:55:34 +08:00 |
Yuanheng Zhao
|
0f2b46a41c
|
[kernel] Revise KVCache copy triton kernel API (#5273)
* [kernel/fix] revise kvcache copy kernel api
* fix benchmark
|
2024-01-16 14:41:02 +08:00 |
Yuanheng Zhao
|
fa85e02b3b
|
[kernel] Add KV cache copy kernel during decoding (#5261)
* add kv copy triton kernel during decoding stage
* add pytest and fix kernel
* fix test utilities
* revise kernel config
* add benchmark for kvcache copy
|
2024-01-15 17:37:20 +08:00 |