Commit Graph

4 Commits (8e08c27e19d3f8dcfbae36dffcad0591c0cf9cfc)

Author SHA1 Message Date
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614)
7 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337)
10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
10 months ago