Yuanheng Zhao
|
e1acb58423
|
[doc] Add inference/speculative-decoding README (#5552)
* add README for spec-dec
* update roadmap
|
2024-04-10 11:07:52 +08:00 |
Yuanheng Zhao
|
d85d91435a
|
[Inference/SpecDec] Support GLIDE Drafter Model (#5455)
* add glide-llama policy and modeling
* update glide modeling, compitable with transformers 4.36.2
* revise glide llama modeling/usage
* fix issues of glimpsing large kv
* revise the way re-loading params for glide drafter
* fix drafter and engine tests
* enable convert to glide strict=False
* revise glide llama modeling
* revise vicuna prompt template
* revise drafter and tests
* apply usage of glide model in engine
|
2024-04-10 11:07:52 +08:00 |
Yuanheng Zhao
|
a37f82629d
|
[Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
* fix flash decoding mask during verification
* add spec-dec
* add test for spec-dec
* revise drafter init
* remove drafter sampling
* retire past kv in drafter
* (trivial) rename attrs
* (trivial) rename arg
* revise how we enable/disable spec-dec
|
2024-04-10 11:07:52 +08:00 |
Yuanheng Zhao
|
5a9b05f7b2
|
[Inference/SpecDec] Add Basic Drafter Model Container (#5405)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)
fix dependency in pytest
* add drafter model container (basic ver)
|
2024-04-10 11:07:51 +08:00 |