3 Commits (18d67d0e8e79c22bded0745c7d3daf8ca40d445c)

Author SHA1 Message Date
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614) 7 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) 10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
Yuanheng Zhao 6e487e7d3c
[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274) 10 months ago