ColossalAI/colossalai/inference/core/request_handler.py

from typing import List

from colossalai.inference.struct import BatchInfo, Sequence


class RequestHandler:
    """
    RequestHandler is the core for handling existing requests and updating current batch.
    During generation process, we call schedule function each iteration to update current batch.

    Args:
        inference_config: Store the configuration information related to inference.
        model_config: The huggingface model config.
    """

    def __init__(self, inference_config, model_config) -> None:
        self.inference_config = inference_config
        self.model_config = model_config
        self._init_cache()
        self.waiting_list: List["Sequence"] = []
        self.running_list: List["Sequence"] = []
        self.batch = BatchInfo.init_batch()

    def _init_cache(self):
        """
        Initialize the cache manager with cache config.
        """

    def schedule(self):
        """
        The main logic of request handler.
        """
        # The code below is only used for testing engine and will be modified.
        if self.waiting_list:
            self.running_list = self.waiting_list
        self.batch.add_seqs(self.running_list)
        return self.batch

    def add_sequence(self, req_seq: "Sequence"):
        """
        Add the request to waiting list.
        """
        self.waiting_list.append(req_seq)

    def abort_sequence(self, seq_id: str):
        """
        Abort the request. #TODO :implement this
        """
        self._find_sequence(seq_id)
        return

    def _find_sequence(self, seq_id: str) -> "Sequence":
        """
        Find the request by seq_id.
        """

    def check_unfinished_seqs(self) -> bool:
        return len(self.waiting_list) != 0 or len(self.running_list) != 0

    def update(self):
        """
        Update the waiting list and running list.
        """

        # The code below is only used for testing engine and will be modified.
        self.waiting_list = []
        self.running_list = []
        finished_sequences = list(self.batch.sequences_set)

        self.batch.clear_batch()
        return finished_sequences
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`from typing import List`

[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`from colossalai.inference.struct import BatchInfo, Sequence`

[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00
[Inference] First PR for rebuild colossal-infer (#5143) * add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:02:44 +00:00			`class RequestHandler:`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`"""`
			`RequestHandler is the core for handling existing requests and updating current batch.`
			`During generation process, we call schedule function each iteration to update current batch.`

			`Args:`
[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`inference_config: Store the configuration information related to inference.`
			`model_config: The huggingface model config.`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`"""`

[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`def __init__(self, inference_config, model_config) -> None:`
			`self.inference_config = inference_config`
			`self.model_config = model_config`
[Inference] First PR for rebuild colossal-infer (#5143) * add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:02:44 +00:00			`self._init_cache()`
[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`self.waiting_list: List["Sequence"] = []`
			`self.running_list: List["Sequence"] = []`
			`self.batch = BatchInfo.init_batch()`
[Inference] First PR for rebuild colossal-infer (#5143) * add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:02:44 +00:00
			`def _init_cache(self):`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`"""`
			`Initialize the cache manager with cache config.`
			`"""`

			`def schedule(self):`
			`"""`
			`The main logic of request handler.`
			`"""`
[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`# The code below is only used for testing engine and will be modified.`
			`if self.waiting_list:`
			`self.running_list = self.waiting_list`
			`self.batch.add_seqs(self.running_list)`
			`return self.batch`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00
[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`def add_sequence(self, req_seq: "Sequence"):`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`"""`
			`Add the request to waiting list.`
			`"""`
[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`self.waiting_list.append(req_seq)`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00
			`def abort_sequence(self, seq_id: str):`
			`"""`
			`Abort the request. #TODO :implement this`
			`"""`
			`self._find_sequence(seq_id)`
			`return`

[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`def _find_sequence(self, seq_id: str) -> "Sequence":`
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`"""`
			`Find the request by seq_id.`
			`"""`
[Inference] First PR for rebuild colossal-infer (#5143) * add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:02:44 +00:00
[Inference] Add readme (roadmap) and fulfill request handler (#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 2023-12-01 09:31:31 +00:00			`def check_unfinished_seqs(self) -> bool:`
[Inference] Add the logic of the inference engine (#5173) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt 2023-12-18 02:40:47 +00:00			`return len(self.waiting_list) != 0 or len(self.running_list) != 0`

			`def update(self):`
			`"""`
			`Update the waiting list and running list.`
			`"""`

			`# The code below is only used for testing engine and will be modified.`
			`self.waiting_list = []`
			`self.running_list = []`
			`finished_sequences = list(self.batch.sequences_set)`

			`self.batch.clear_batch()`
			`return finished_sequences`