ColossalAI/colossalai/inference/tensor_parallel/policies
Jianghai ef4c14a5e2
[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014)
* fix bug

* fix

* fix multiquery

* fix multiquery

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
2023-11-07 15:01:50 +08:00
..
__init__.py
bloom.py
chatglm2.py [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) 2023-11-07 15:01:50 +08:00
llama.py [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) 2023-10-30 14:04:37 +08:00