453 Commits (80c3c8789bdfc095618a0b725a4980cd575c6c6b)

Author SHA1 Message Date
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745) 6 months ago
Yuanheng Zhao 677cbfacf8
[Fix/Example] Fix Llama Inference Loading Data Type (#5763) 6 months ago
hxwang 154720ba6e [chore] refactor profiler utils 6 months ago
genghaozhe 87665d7922 correct argument help message 6 months ago
Haze188 4d097def96
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) 6 months ago
genghaozhe b9269d962d add args.prefetch_num for benchmark 6 months ago
genghaozhe fba04e857b [bugs] fix args.profile=False DummyProfiler errro 6 months ago
hxwang ca674549e0 [chore] remove unnecessary test & changes 6 months ago
hxwang 63c057cd8e [example] add profile util for llama 6 months ago
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) 6 months ago
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742) 6 months ago
genghaozhe a280517dd9 remove unrelated file 6 months ago
genghaozhe 1ec92d29af remove perf log, unrelated file and so on 6 months ago
genghaozhe 5c6c5d6be3 remove comments 6 months ago
genghaozhe df63db7e63 remote comments 6 months ago
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725) 6 months ago
hxwang 2e68eebdfe [chore] refactor & sync 6 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712) 6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 6 months ago
CjhHa1 5d9a49483d [Inference] Add example test_ci script 7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) 7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470) 7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) 7 months ago
Yuanheng Zhao 12e7c28d5e
[hotfix] fix OpenMOE example import path (#5697) 7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688) 7 months ago
Edenzzzz c25f83c85f
fix missing pad token (#5690) 7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677) 7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) 7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
Tong Li 68ec99e946
[hotfix] add soft link to support required files (#5661) 7 months ago
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658) 7 months ago
Yuanheng Zhao f342a93871
[Fix] Remove obsolete files - inference (#5650) 7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646) 7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643) 7 months ago
yuehuayingxueluo 90cd5227a3
[Fix/Inference]Fix vllm benchmark (#5630) 7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613) 7 months ago
Yuanheng Zhao 04863a9b14
[example] Update Llama Inference example (#5629) 7 months ago
binmakeswell f4c5aafe29
[example] llama3 (#5631) 7 months ago
Hongxin Liu 4de4e31818
[exampe] update llama example (#5626) 7 months ago
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611) 7 months ago
Edenzzzz d83c633ca6
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) 7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531) 7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591) 7 months ago
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566) 8 months ago
digger yu 341263df48
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) 8 months ago
digger yu a799ca343b
[fix] fix typo s/muiti-node /multi-node etc. (#5448) 8 months ago
Edenzzzz 15055f9a36
[hotfix] quick fixes to make legacy tutorials runnable (#5559) 8 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508) 8 months ago
Yuanheng Zhao 36c4bb2893
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532) 8 months ago