Yuanheng Zhao
|
a37f82629d
|
[Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
* fix flash decoding mask during verification
* add spec-dec
* add test for spec-dec
* revise drafter init
* remove drafter sampling
* retire past kv in drafter
* (trivial) rename attrs
* (trivial) rename arg
* revise how we enable/disable spec-dec
|
2024-04-10 11:07:52 +08:00 |
Yuanheng Zhao
|
d63c469f45
|
[Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)
fix dependency in pytest
* resolve conflicts for revising flash-attn
* adapt kv cache copy kernel for spec-dec
* fix seqlen-n kvcache copy kernel/tests
* test kvcache copy - use torch.equal
* add assertions
* (trivial) comment out
|
2024-04-10 11:07:51 +08:00 |
yuehuayingxueluo
|
0aa27f1961
|
[Inference]Move benchmark-related code to the example directory. (#5408)
* move benchmark-related code to the example directory.
* fix bugs in test_fused_rotary_embedding.py
|
2024-02-28 16:46:03 +08:00 |
Frank Lee
|
e76acbb076
|
[inference] moved ops tests to test_infer (#5354)
|
2024-02-02 13:51:22 +08:00 |