Commit Graph

64 Commits (23199e34cc0ae6fdd6c9297ef088bf4dc4713b52)

Author SHA1 Message Date
yuehuayingxueluo 86b63f720c
[Inference]Adapted to the triton attn kernels (#5264)
* adapted to the triton attn kernels

* fix pad input

* adapted to copy_kv_to_blocked_cache

* fix ci test

* update kv memcpy

* remove print
2024-01-17 16:03:10 +08:00
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py 2024-01-11 13:46:14 +00:00
yuehuayingxueluo fab294c7f4 fix CI bugs 2024-01-11 13:46:14 +00:00
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 2024-01-11 13:44:06 +00:00
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py 2024-01-11 13:44:06 +00:00
yuehuayingxueluo 02c1bf8b2a add context_attention_unpadded 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 9489dc64d8 precision alignment 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 62968588d1 fix bugs in request_handler 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 86853a37d5 Add padding llama model 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173)
* add infer_struct and infer_config

* update codes

* change InferConfig

* Add hf_model_config to the engine

* rm _get_hf_model_config

* update codes

* made adjustments according to the feedback from the reviewer.

* update codes

* add ci test for config and struct

* Add the logic of the inference engine

* update engine and test

* Recover cache_manager.py

* add logger

* fix conflict

* update codes

* update codes

* update model and tokenizer

* fix add the logic about shardformer

* change kvcache_manager docstring

* add policy

* fix ci bug in test_kvcache_manager.py

* remove codes related o tokenizer and move model_policy

* fix  code style

* add ordered_set to requirements-infer.txt

* Delete extra empty lines

* add ordered_set to requirements-test.txt
2024-01-11 13:39:56 +00:00
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178)
* unify the config setting

* fix test

* fix import

* fix test

* fix

* fix

* add logger

* revise log info

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
2024-01-11 13:39:29 +00:00
yuehuayingxueluo fab9b931d9 [Inference]Add BatchInferState, Sequence and InferConfig (#5149)
* add infer_struct and infer_config

* update codes

* change InferConfig

* Add hf_model_config to the engine

* rm _get_hf_model_config

* update codes

* made adjustments according to the feedback from the reviewer.

* update codes

* add ci test for config and struct
2024-01-11 13:39:29 +00:00
Jianghai 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143)
* add engine and scheduler

* add dirs

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
2024-01-11 13:39:29 +00:00