ColossalAI/colossalai/inference/modeling
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
* fix flash decoding mask during verification

* add spec-dec

* add test for spec-dec

* revise drafter init

* remove drafter sampling

* retire past kv in drafter

* (trivial) rename attrs

* (trivial) rename arg

* revise how we enable/disable spec-dec
2024-04-10 11:07:52 +08:00
..
layers [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 2024-02-06 19:38:25 +08:00
models [Inference/SpecDec] Add Speculative Decoding Implementation (#5423) 2024-04-10 11:07:52 +08:00
policy [Fix/Inference] Remove unused and non-functional functions (#5543) 2024-04-02 14:16:59 +08:00
__init__.py [doc] updated inference readme (#5343) 2024-02-02 14:31:10 +08:00