1183 Commits (37e35230ff4666231dd65435b5f7b2a2fcfaf9e6)

Author SHA1 Message Date
flybird11111 2a0558d8ec
[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) 10 months ago
Frank Lee d69cd2eb89
[workflow] fixed oom tests (#5275) 10 months ago
Yuanheng Zhao 0f2b46a41c
[kernel] Revise KVCache copy triton kernel API (#5273) 10 months ago
Yuanheng Zhao fa85e02b3b
[kernel] Add KV cache copy kernel during decoding (#5261) 10 months ago
Wenhao Chen ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg (#5268) 10 months ago
FrankLeeeee 1ded7e81ef [git] fixed rebased files 11 months ago
Yuanheng Zhao 1513f20f4d [kernel] Add flash decoding triton kernel for blocked kv cache (#5249) 11 months ago
Jianghai fded91d049 [Inference] Kernel: no pad rotary embedding (#5252) 11 months ago
yuehuayingxueluo fab294c7f4 fix CI bugs 11 months ago
Jianghai e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229) 11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 11 months ago
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py 11 months ago
Jianghai bfd9b1b494 [Inference] Pytorch Attention func, pad&nopad input support (#5219) 11 months ago
yuehuayingxueluo bbfebfb9fc fix bugs in sampler 11 months ago
yuehuayingxueluo 02c1bf8b2a add context_attention_unpadded 11 months ago
Yuanheng Zhao 07b5283b6a [kernel] Add triton kernel for context attention (FAv2) without padding (#5192) 11 months ago
yuehuayingxueluo 4df8876fca Fixed a writing error 11 months ago
yuehuayingxueluo 9489dc64d8 precision alignment 11 months ago
yuehuayingxueluo 62968588d1 fix bugs in request_handler 11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166) 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178) 11 months ago
Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156) 11 months ago
yuehuayingxueluo fab9b931d9 [Inference]Add BatchInferState, Sequence and InferConfig (#5149) 11 months ago
Yuanheng Zhao 2bb92243d4 [Inference/NFC] Clean outdated inference tests and deprecated kernels (#5159) 11 months ago
flybird11111 e830ef917d
[ci] fix shardformer tests. (#5255) 11 months ago
Frank Lee 2b83418719
[ci] fixed ddp test (#5254) 11 months ago
Frank Lee d5eeeb1416
[ci] fixed booster test (#5251) 11 months ago
Frank Lee edf94a35c3
[workflow] fixed build CI (#5240) 11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062) 11 months ago
Xuanlei Zhao dd2c28a323
[npu] use extension for op builder (#5172) 11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) 11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) 11 months ago
flybird11111 79718fae04
[shardformer] llama support DistCrossEntropy (#5176) 12 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) 12 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) 12 months ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127) 12 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 1 year ago
Zhongkai Zhao 75af66cd81
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) 1 year ago
Xu Kai fb103cfd6e
[inference] update examples and engine (#5073) 1 year ago
Bin Jia 0c7d8bebd5
[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) 1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067) 1 year ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057) 1 year ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) 1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043) 1 year ago
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046) 1 year ago
Zhongkai Zhao 70885d707d
[hotfix] Suport extra_kwargs in ShardConfig (#5031) 1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942) 1 year ago