ColossalAI/tests
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
* add kvcache manager funcs for batching

* add batch bucket for batching

* revise RunningList struct in handler

* add kvcache/batch funcs for compatibility

* use new batching methods

* fix indexing bugs

* revise abort logic

* use cpu seq lengths/block tables

* rm unused attr in Sequence

* fix type conversion/default arg

* add and revise pytests

* revise pytests, rm unused tests

* rm unused statements

* fix pop finished indexing issue

* fix: use index in batch when retrieving inputs/update seqs

* use dict instead of odict in batch struct

* arg type hinting

* fix make compress

* refine comments

* fix: pop_n_seqs to pop the first n seqs

* add check in request handler

* remove redundant conversion

* fix test for request handler

* fix pop method in batch bucket

* fix prefill adding
2024-02-19 17:18:20 +08:00
..
kit [workflow] fixed oom tests (#5275) 2024-01-16 18:55:13 +08:00
test_analyzer [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_auto_parallel [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_autochunk [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_booster [hotfix] fix 3d plugin test (#5292) 2024-01-22 15:19:04 +08:00
test_checkpoint_io [feat] refactored extension module (#5298) 2024-01-25 17:01:48 +08:00
test_cluster [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_config [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_device [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_fx [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_gptq [feature] add gptq for inference (#4754) 2023-09-22 11:02:50 +08:00
test_infer [Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 2024-02-19 17:18:20 +08:00
test_lazy [workflow] fixed oom tests (#5275) 2024-01-16 18:55:13 +08:00
test_legacy [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_moe [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_optimizer [feat] refactored extension module (#5298) 2024-01-25 17:01:48 +08:00
test_pipeline Merge branch 'main' into sync/npu 2024-01-18 12:05:21 +08:00
test_shardformer [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) 2024-01-17 17:42:29 +08:00
test_smoothquant [inference] Add smmoothquant for llama (#4904) 2023-10-16 11:28:44 +08:00
test_tensor [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_utils [feat] refactored extension module (#5298) 2024-01-25 17:01:48 +08:00
test_zero [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
__init__.py [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00