4 Commits (colossalchat_upgrade)

Author SHA1 Message Date
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 6 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614) 7 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) 10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
Yuanheng Zhao 6e487e7d3c
[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274) 10 months ago