Yuanheng Zhao
7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers ( #5837 )
...
* fix glide llama model
* revise
5 months ago
Yuanheng Zhao
8bcfe360fd
[example] Update Inference Example ( #5725 )
...
* [example] update inference example
6 months ago
Yuanheng Zhao
55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements ( #5688 )
...
* clean requirements
* modify example inference struct
* add test ci scripts
* mark test_infer as submodule
* rm deprecated cls & deps
* import of HAS_FLASH_ATTN
* prune inference tests to be run
* prune triton kernel tests
* increment pytest timeout mins
* revert import path in openmoe
7 months ago
Yuanheng Zhao
e1acb58423
[doc] Add inference/speculative-decoding README ( #5552 )
...
* add README for spec-dec
* update roadmap
8 months ago
Yuanheng Zhao
d85d91435a
[Inference/SpecDec] Support GLIDE Drafter Model ( #5455 )
...
* add glide-llama policy and modeling
* update glide modeling, compitable with transformers 4.36.2
* revise glide llama modeling/usage
* fix issues of glimpsing large kv
* revise the way re-loading params for glide drafter
* fix drafter and engine tests
* enable convert to glide strict=False
* revise glide llama modeling
* revise vicuna prompt template
* revise drafter and tests
* apply usage of glide model in engine
8 months ago
Yuanheng Zhao
a37f82629d
[Inference/SpecDec] Add Speculative Decoding Implementation ( #5423 )
...
* fix flash decoding mask during verification
* add spec-dec
* add test for spec-dec
* revise drafter init
* remove drafter sampling
* retire past kv in drafter
* (trivial) rename attrs
* (trivial) rename arg
* revise how we enable/disable spec-dec
8 months ago
Yuanheng Zhao
5a9b05f7b2
[Inference/SpecDec] Add Basic Drafter Model Container ( #5405 )
...
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399 )
fix dependency in pytest
* add drafter model container (basic ver)
8 months ago