Commit Graph

4 Commits (912e24b2aaf4acda0e2b9a45a7d4327fbfc8bd39)

Author SHA1 Message Date
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
* fix flash decoding mask during verification

* add spec-dec

* add test for spec-dec

* revise drafter init

* remove drafter sampling

* retire past kv in drafter

* (trivial) rename attrs

* (trivial) rename arg

* revise how we enable/disable spec-dec
2024-04-10 11:07:52 +08:00
Yuanheng Zhao d63c469f45 [Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)

fix dependency in pytest

* resolve conflicts for revising flash-attn

* adapt kv cache copy kernel for spec-dec

* fix seqlen-n kvcache copy kernel/tests

* test kvcache copy - use torch.equal

* add assertions

* (trivial) comment out
2024-04-10 11:07:51 +08:00
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
* move benchmark-related code to the example directory.

* fix bugs in test_fused_rotary_embedding.py
2024-02-28 16:46:03 +08:00
Frank Lee e76acbb076
[inference] moved ops tests to test_infer (#5354) 2024-02-02 13:51:22 +08:00