Commit Graph

77 Commits (f5c84af0b01bcd2e993d38dc628793f7f0a8ba64)

Author SHA1 Message Date
Runyu Lu bcf0181ecd
[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895)
4 months ago
Runyu Lu 66abf1c6e8
[HotFix] CI,import,requirements-test for #5838 (#5892)
5 months ago
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838)
5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
Runyu Lu 3c7cda0c9a
[Inference]Lazy Init Support (#5785)
5 months ago
Yuanheng Zhao 7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837)
5 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init
6 months ago
char-1ee eec77e5702 Fix tests and naming
6 months ago
char-1ee 04386d9eff Refactor modeling by adding attention backend
6 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
Yuanheng Zhao bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message (#5730)
6 months ago
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710)
6 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
6 months ago
Runyu Lu 74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717)
7 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
7 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
7 months ago
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
CjhHa1 7bbb28e48b [Inference] resolve rebase conflicts
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
7 months ago
yuehuayingxueluo f79963199c
[inference]Add alibi to flash attn function (#5678)
7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563)
7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
7 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557)
8 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455)
8 months ago
Yuanheng Zhao 912e24b2aa [SpecDec] Fix inputs for speculation and revise past KV trimming (#5449)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
傅剑寒 e6496dd371
[Inference] Optimize request handler of llama (#5512)
8 months ago
Runyu Lu 68e9396bc0 [fix] merge conflicts
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
Runyu Lu ff4998c6f3 [fix] remove unused comment
8 months ago
Runyu Lu 5b017d6324 [fix]
8 months ago
Runyu Lu ae24b4f025 diverse tests
9 months ago
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug
9 months ago
Runyu Lu 9dec66fad6 [fix] multi graphs capture error
9 months ago
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error
9 months ago
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api
9 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395)
9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago
yuehuayingxueluo 8c69debdc7
[Inference]Support vllm testing in benchmark scripts (#5379)
10 months ago
Frank Lee 9afa52061f
[inference] refactored config (#5376)
10 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337)
10 months ago
Frank Lee 58740b5f68
[inference] added inference template (#5375)
10 months ago
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)
10 months ago
yuehuayingxueluo 631862f339
[Inference]Optimize generation process of inference engine (#5356)
10 months ago