ColossalAI

History

Yuanheng Zhao b21aac5bae [Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 ) * add kvcache manager funcs for batching * add batch bucket for batching * revise RunningList struct in handler * add kvcache/batch funcs for compatibility * use new batching methods * fix indexing bugs * revise abort logic * use cpu seq lengths/block tables * rm unused attr in Sequence * fix type conversion/default arg * add and revise pytests * revise pytests, rm unused tests * rm unused statements * fix pop finished indexing issue * fix: use index in batch when retrieving inputs/update seqs * use dict instead of odict in batch struct * arg type hinting * fix make compress * refine comments * fix: pop_n_seqs to pop the first n seqs * add check in request handler * remove redundant conversion * fix test for request handler * fix pop method in batch bucket * fix prefill adding		2024-02-19 17:18:20 +08:00
..
test_models	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )	2024-01-30 16:06:09 +08:00
test_ops/triton	[Inference/opt] Fused KVCahce Memcopy (#5374 )	2024-02-07 17:15:42 +08:00
_utils.py	[Inference] Add the logic of the inference engine (#5173 )	2024-01-11 13:39:56 +00:00
test_batch_bucket.py	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )	2024-02-19 17:18:20 +08:00
test_config_and_struct.py	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )	2024-02-19 17:18:20 +08:00
test_inference_engine.py	[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 )	2024-02-07 17:55:48 +08:00
test_kvcache_manager.py	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )	2024-02-19 17:18:20 +08:00
test_request_handler.py	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )	2024-02-19 17:18:20 +08:00