28 Commits (main)

Author SHA1 Message Date
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) 7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689) 7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688) 7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624) 7 months ago
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563) 7 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455) 8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 8 months ago
Steve Luo ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454) 8 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 9 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) 10 months ago
Frank Lee 58740b5f68
[inference] added inference template (#5375) 10 months ago
yuehuayingxueluo 631862f339
[Inference]Optimize generation process of inference engine (#5356) 10 months ago
Frank Lee f8e456d202
[inference] simplified config verification (#5346) 10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
FrankLeeeee 1ded7e81ef [git] fixed rebased files 11 months ago
yuehuayingxueluo fab294c7f4 fix CI bugs 11 months ago
Jianghai e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229) 11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 11 months ago
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py 11 months ago
yuehuayingxueluo bbfebfb9fc fix bugs in sampler 11 months ago
yuehuayingxueluo 02c1bf8b2a add context_attention_unpadded 11 months ago
yuehuayingxueluo 4df8876fca Fixed a writing error 11 months ago
yuehuayingxueluo 9489dc64d8 precision alignment 11 months ago
yuehuayingxueluo 62968588d1 fix bugs in request_handler 11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166) 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago