Commit Graph

12 Commits (8fd25d6e09069a8437c6ebee8dd83e1de4c9b83d)

Author SHA1 Message Date
Runyu Lu 3c7cda0c9a
[Inference]Lazy Init Support (#5785)
5 months ago
Li Xingjian 8554585a5f
[Inference] Fix flash-attn import and add model test (#5794)
6 months ago
Runyu Lu c0948aff97
[Inference]refactor baichuan (#5791)
6 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init
6 months ago
char-1ee eec77e5702 Fix tests and naming
6 months ago
char-1ee 04386d9eff Refactor modeling by adding attention backend
6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
yuehuayingxueluo f79963199c
[inference]Add alibi to flash attn function (#5678)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659)
7 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614)
7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
8 months ago