Commit Graph

3255 Commits (61a1b2e798edcbf91ac35966a4047407ad6aa62d)
 

Author SHA1 Message Date
Yuanheng Zhao af8359c430
[hotfix] fix boundary check in batch (#5306)
10 months ago
Jianghai c647e00e3c
[Inference]Add fused rotary kernel and get cos cache kernel (#5302)
10 months ago
Yuanheng Zhao 3da9993b0d
[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301)
10 months ago
Jianghai 8e606ecc7e
[Inference] Benchmarking rotary embedding and add a fetch function (#5277)
10 months ago
Desperado-Jia ddf879e2db
fix bug for mefture (#5299)
10 months ago
yuehuayingxueluo b7853196a0
Merge pull request #5297 from yuehuayingxueluo/fix_rotary_embedding
10 months ago
yuehuayingxueluo cea9c86e45 add utils.py
10 months ago
Hongxin Liu d7f8db8e21
[hotfix] fix 3d plugin test (#5292)
10 months ago
yuehuayingxueluo bfff9254ac
[inference] Adapted to Rotary Embedding and RMS Norm (#5283)
10 months ago
flybird11111 f7e3f82a7e
fix llama pretrain (#5287)
10 months ago
Desperado-Jia 6a56967855
[doc] add llama2-13B disyplay (#5285)
10 months ago
Yuanheng Zhao 6e487e7d3c
[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274)
10 months ago
Jianghai 9e2342bde2
[Hotfix] Fix bugs in testing continuous batching (#5270)
10 months ago
Michelle 32cb74493a
fix auto loading gpt2 tokenizer (#5279)
10 months ago
Frank Lee d66e6988bc
Merge pull request #5278 from ver217/sync/npu
10 months ago
ver217 148469348a Merge branch 'main' into sync/npu
10 months ago
Yaozheng Fang 5ae9099f92
[kernel] Add RMSLayerNorm triton kernel (#5262)
10 months ago
Zhongkai Zhao 5d9a0ae75b
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)
10 months ago
yuehuayingxueluo 86b63f720c
[Inference]Adapted to the triton attn kernels (#5264)
10 months ago
flybird11111 46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. (#5246)
10 months ago
flybird11111 2a0558d8ec
[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)
10 months ago
Frank Lee d69cd2eb89
[workflow] fixed oom tests (#5275)
10 months ago
Yuanheng Zhao 0f2b46a41c
[kernel] Revise KVCache copy triton kernel API (#5273)
10 months ago
Frank Lee 04244aaaf1
[workflow] fixed incomplete bash command (#5272)
10 months ago
Jianghai d8db500efc
[Inference] Fix request handler and add recycle logic (#5260)
10 months ago
Frank Lee c597678da4
[doc] updated inference readme (#5269)
10 months ago
Yuanheng Zhao fa85e02b3b
[kernel] Add KV cache copy kernel during decoding (#5261)
10 months ago
Wenhao Chen ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg (#5268)
11 months ago
FrankLeeeee 1ded7e81ef [git] fixed rebased files
11 months ago
Yuanheng Zhao 1513f20f4d [kernel] Add flash decoding triton kernel for blocked kv cache (#5249)
11 months ago
Jianghai fded91d049 [Inference] Kernel: no pad rotary embedding (#5252)
11 months ago
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py
11 months ago
yuehuayingxueluo 10e3c9f923 rm torch.cuda.synchronize
11 months ago
yuehuayingxueluo fab294c7f4 fix CI bugs
11 months ago
yuehuayingxueluo 2a73e828eb fix bugs related to processing padding mask
11 months ago
Jianghai e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229)
11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward
11 months ago
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py
11 months ago
Jianghai bfd9b1b494 [Inference] Pytorch Attention func, pad&nopad input support (#5219)
11 months ago
yuehuayingxueluo 3ad1f3b78b fix beam_width
11 months ago
yuehuayingxueluo b2eb9cd186 Fixed a typo
11 months ago
yuehuayingxueluo bbfebfb9fc fix bugs in sampler
11 months ago
yuehuayingxueluo 02c1bf8b2a add context_attention_unpadded
11 months ago
Yuanheng Zhao 07b5283b6a [kernel] Add triton kernel for context attention (FAv2) without padding (#5192)
11 months ago
yuehuayingxueluo 4df8876fca Fixed a writing error
11 months ago
yuehuayingxueluo 9489dc64d8 precision alignment
11 months ago
yuehuayingxueluo 62968588d1 fix bugs in request_handler
11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame
11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model
11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
11 months ago