ColossalAI/colossalai/inference/modeling/models
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
2024-05-14 12:46:54 +08:00
..
__init__.py fix bugs in request_handler 2024-01-11 13:39:56 +00:00
glide_llama.py [Inference/SpecDec] Support GLIDE Drafter Model (#5455) 2024-04-10 11:07:52 +08:00
nopadding_baichuan.py add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
nopadding_llama.py add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00