Zhongkai Zhao
361cf63cb0
[Refactor] refactor policy search and quant type controlling in inference ( #5035 )
...
* [Refactor] refactor policy search and quant type controling in inference
2023-11-14 17:26:59 +08:00
Xu Kai
c6295c3381
[Refactor] remove useless inference code ( #5022 )
...
* remove useless code
* fix quant model
* fix test import bug
* mv original inference legacy
* fix chatglm2
2023-11-10 14:47:06 +08:00
Bin Jia
81b8f5e76a
[Inference Refactor] Merge chatglm2 with pp and tp ( #5023 )
...
merge chatglm with pp and tp
2023-11-09 14:46:19 +08:00
Xu Kai
450115bd0f
[refactor] refactor gptq and smoothquant llama ( #5012 )
...
* refactor gptq and smoothquant llama
* fix import error
* fix linear import torch-int
* fix smoothquant llama import error
* fix import accelerate error
* fix bug
* fix import smooth cuda
* fix smoothcuda
2023-11-09 10:12:11 +08:00
Bin Jia
48d0a58d10
add support for bloom ( #5008 )
2023-11-09 10:12:11 +08:00
Xu Kai
f747d13040
[inference] support only TP ( #4998 )
...
* support only tp
* enable tp
2023-11-09 10:12:11 +08:00
Bin Jia
b6696beb04
[Pipeline Inference] Merge pp with tp ( #4993 )
...
* refactor pipeline into new CaiInferEngine
* updata llama modeling forward
* merge tp with pp
* update docstring
* optimize test workflow and example
* fix typo
* add assert and todo
2023-11-01 12:46:21 +08:00