Commit Graph

157 Commits (7f9ec599be461cef555f4da2f796b46a3631d18f)

Author SHA1 Message Date
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815)
5 months ago
Li Xingjian 8554585a5f
[Inference] Fix flash-attn import and add model test (#5794)
6 months ago
Runyu Lu c0948aff97
[Inference]refactor baichuan (#5791)
6 months ago
char-1ee f5981e808e Remove flash attention backend
6 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init
6 months ago
char-1ee eec77e5702 Fix tests and naming
6 months ago
char-1ee 04386d9eff Refactor modeling by adding attention backend
6 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
Yuanheng Zhao 406443200f
[Hotfix] Add missing init file in inference.executor (#5774)
6 months ago
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742)
6 months ago
binmakeswell 4647ec28c8
[inference] release (#5747)
6 months ago
Yuanheng Zhao d8b1ea4ac9
[doc] Update Inference Readme (#5736)
6 months ago
Yuanheng Zhao bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message (#5730)
6 months ago
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710)
6 months ago
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725)
6 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
6 months ago
Runyu Lu 74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717)
6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
7 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
7 months ago
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700)
7 months ago
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
CjhHa1 7bbb28e48b [Inference] resolve rebase conflicts
7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
7 months ago
Yuanheng Zhao f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695)
7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main)
7 months ago
yuehuayingxueluo f79963199c
[inference]Add alibi to flash attn function (#5678)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659)
7 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614)
7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
7 months ago
Yuanheng Zhao 04863a9b14
[example] Update Llama Inference example (#5629)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563)
7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
7 months ago
Yuanheng f8598e3ec5 [Fix] Llama Modeling Control with Spec-Dec (#5580)
8 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557)
8 months ago
Yuanheng Zhao e1acb58423 [doc] Add inference/speculative-decoding README (#5552)
8 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455)
8 months ago
Yuanheng Zhao 912e24b2aa [SpecDec] Fix inputs for speculation and revise past KV trimming (#5449)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
Yuanheng Zhao 5a9b05f7b2 [Inference/SpecDec] Add Basic Drafter Model Container (#5405)
8 months ago
Yuanheng Zhao 4bb5d8923a
[Fix/Inference] Remove unused and non-functional functions (#5543)
8 months ago