Commit Graph

2 Commits (a37f82629d7b9e3c3a0f430b8dd3ff6f38ddf1d4)

Author SHA1 Message Date
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
* fix flash decoding mask during verification

* add spec-dec

* add test for spec-dec

* revise drafter init

* remove drafter sampling

* retire past kv in drafter

* (trivial) rename attrs

* (trivial) rename arg

* revise how we enable/disable spec-dec
2024-04-10 11:07:52 +08:00
Yuanheng Zhao 5a9b05f7b2 [Inference/SpecDec] Add Basic Drafter Model Container (#5405)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)

fix dependency in pytest

* add drafter model container (basic ver)
2024-04-10 11:07:51 +08:00