Commit Graph

106 Commits (8fd25d6e09069a8437c6ebee8dd83e1de4c9b83d)

Author SHA1 Message Date
Hongxin Liu b3db1058ec
[release] update version (#6041)
3 months ago
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815)
6 months ago
Li Xingjian 8554585a5f
[Inference] Fix flash-attn import and add model test (#5794)
6 months ago
char-1ee b303976a27 Fix test import
6 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init
6 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
Hongxin Liu 68359ed1e1
[release] update version (#5752)
6 months ago
Yuanheng Zhao b96c6390f4
[inference] Fix running time of test_continuous_batching (#5750)
6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
7 months ago
傅剑寒 50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706)
7 months ago
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main)
7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659)
7 months ago
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658)
7 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614)
7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
7 months ago
yuehuayingxueluo 12f10d5b0b
[Fix/Inference]Fix CUDA Rotary Rmbedding GQA (#5623)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611)
7 months ago
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563)
7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
8 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557)
8 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
Yuanheng Zhao 5a9b05f7b2 [Inference/SpecDec] Add Basic Drafter Model Container (#5405)
8 months ago
Yuanheng Zhao d63c469f45 [Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401)
8 months ago
yuehuayingxueluo 04aca9e55b
[Inference/Kernel]Add get_cos_and_sin Kernel (#5528)
8 months ago
Runyu Lu 68e9396bc0 [fix] merge conflicts
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
Runyu Lu 9fe61b4475 [fix]
8 months ago
Runyu Lu aabc9fb6aa [feat] add use_cuda_kernel option
8 months ago
Runyu Lu d02e257abd
Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph
9 months ago
Runyu Lu ae24b4f025 diverse tests
9 months ago
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug
9 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
9 months ago
Steve Luo ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454)
9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417)
9 months ago
xs_courtesy 95c21498d4 add silu_and_mul for infer
9 months ago
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
9 months ago