20 Commits (8241c0c054b38a109ed3ce7be1052a1e600b8471)

Author SHA1 Message Date
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
Runyu Lu c0948aff97
[Inference]refactor baichuan (#5791) 6 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init 6 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705) 6 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659) 7 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614) 7 months ago
Yuanheng Zhao 04863a9b14
[example] Update Llama Inference example (#5629) 7 months ago
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563) 7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591) 7 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455) 8 months ago
Yuanheng Zhao 4bb5d8923a
[Fix/Inference] Remove unused and non-functional functions (#5543) 8 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 9 months ago
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390) 9 months ago
Yuanheng Zhao 1dedb57747
[Fix/Infer] Remove unused deps and revise requirements (#5341) 10 months ago
yuehuayingxueluo 21ad4a27f9
[Inference/opt]Optimize the mid tensor of RMS Norm (#5350) 10 months ago
yuehuayingxueluo 249644c23b
[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation,add fused_qkv and fused linear_add (#5340) 10 months ago
yuehuayingxueluo e8f0642f28
[Inference]Add Nopadding Llama Modeling (#5327) 10 months ago
yuehuayingxueluo bfff9254ac
[inference] Adapted to Rotary Embedding and RMS Norm (#5283) 10 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago