Commit Graph

7 Commits (361cf63cb001b0e242695c40cb64de904f0f0226)

Author SHA1 Message Date
Zhongkai Zhao 361cf63cb0
[Refactor] refactor policy search and quant type controlling in inference (#5035)
* [Refactor] refactor policy search and quant type controling in inference
2023-11-14 17:26:59 +08:00
Xu Kai c6295c3381
[Refactor] remove useless inference code (#5022)
* remove useless code

* fix quant model

* fix test import bug

* mv original inference legacy

* fix chatglm2
2023-11-10 14:47:06 +08:00
Bin Jia 81b8f5e76a
[Inference Refactor] Merge chatglm2 with pp and tp (#5023)
merge chatglm with pp and tp
2023-11-09 14:46:19 +08:00
Xu Kai 450115bd0f [refactor] refactor gptq and smoothquant llama (#5012)
* refactor gptq and smoothquant llama

* fix import error

* fix linear import torch-int

* fix smoothquant llama import error

* fix import accelerate error

* fix bug

* fix import smooth cuda

* fix smoothcuda
2023-11-09 10:12:11 +08:00
Bin Jia 48d0a58d10 add support for bloom (#5008) 2023-11-09 10:12:11 +08:00
Xu Kai f747d13040 [inference] support only TP (#4998)
* support only tp

* enable tp
2023-11-09 10:12:11 +08:00
Bin Jia b6696beb04
[Pipeline Inference] Merge pp with tp (#4993)
* refactor pipeline into new CaiInferEngine

* updata llama modeling forward

* merge tp with pp

* update docstring

* optimize test workflow and example

* fix typo

* add assert and todo
2023-11-01 12:46:21 +08:00