ColossalAI/colossalai/inference/modeling
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
2024-05-14 12:46:54 +08:00
..
layers [Inference] Adapt Baichuan2-13B TP (#5659) 2024-04-30 15:47:07 +08:00
models add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
policy [Feat]Inference RPC Server Support (#5705) 2024-05-14 10:00:55 +08:00
__init__.py [doc] updated inference readme (#5343) 2024-02-02 14:31:10 +08:00