Commit Graph

426 Commits (56ed09aba5e017fc0c211dac70215c2f83815919)

Author SHA1 Message Date
Yuanheng Zhao 56ed09aba5 [sync] resolve conflicts of merging main
7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
7 months ago
Tong Li 68ec99e946
[hotfix] add soft link to support required files (#5661)
7 months ago
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658)
7 months ago
Yuanheng Zhao f342a93871
[Fix] Remove obsolete files - inference (#5650)
7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646)
7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
7 months ago
yuehuayingxueluo 90cd5227a3
[Fix/Inference]Fix vllm benchmark (#5630)
7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
7 months ago
Yuanheng Zhao 04863a9b14
[example] Update Llama Inference example (#5629)
7 months ago
binmakeswell f4c5aafe29
[example] llama3 (#5631)
7 months ago
Hongxin Liu 4de4e31818
[exampe] update llama example (#5626)
7 months ago
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611)
7 months ago
Edenzzzz d83c633ca6
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606)
7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
7 months ago
Yuanheng ed5ebd1735 [Fix] resolve conflicts of merging main
8 months ago
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
8 months ago
digger yu 341263df48
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548)
8 months ago
digger yu a799ca343b
[fix] fix typo s/muiti-node /multi-node etc. (#5448)
8 months ago
Edenzzzz 15055f9a36
[hotfix] quick fixes to make legacy tutorials runnable (#5559)
8 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
8 months ago
Yuanheng Zhao 36c4bb2893
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532)
8 months ago
yuehuayingxueluo 934e31afb2
The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519)
8 months ago
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189)
8 months ago
Yuanheng Zhao 131f32a076
[fix] fix grok-1 example typo (#5506)
8 months ago
binmakeswell 34e909256c
[release] grok-1 inference benchmark (#5500)
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404)
8 months ago
Yuanheng Zhao 5fcd7795cd
[example] update Grok-1 inference (#5495)
8 months ago
binmakeswell 6df844b8c4
[release] grok-1 314b inference (#5490)
8 months ago
Hongxin Liu 848a574c26
[example] add grok-1 inference (#5485)
8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
9 months ago
digger yu 385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. (#5429)
9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417)
9 months ago
Youngon 68f55a709c
[hotfix] fix stable diffusion inference bug. (#5289)
9 months ago
Luo Yihang e239cf9060
[hotfix] fix typo of openmoe model source (#5403)
9 months ago
MickeyCHAN e304e4db35
[hotfix] fix sd vit import error (#5420)
9 months ago
Hongxin Liu 070df689e6
[devops] fix extention building (#5427)
9 months ago
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295)
9 months ago
FrankLeeeee 0310b76e9d Merge branch 'main' into sync/main
9 months ago
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
9 months ago
Hongxin Liu d882d18c65
[example] reuse flash attn patch (#5400)
9 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395)
9 months ago
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
9 months ago
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383)
9 months ago
yuehuayingxueluo 8c69debdc7
[Inference]Support vllm testing in benchmark scripts (#5379)
10 months ago