ColossalAI/colossalai/kernel/triton
Jianghai ce7ade3882
[inference] chatglm2 infer demo (#4724)
* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix
2023-09-22 11:12:50 +08:00
..
__init__.py [feature] add gptq for inference (#4754) 2023-09-22 11:02:50 +08:00
context_attention.py [inference] chatglm2 infer demo (#4724) 2023-09-22 11:12:50 +08:00
copy_kv_cache_dest.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
fused_layernorm.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
gptq_triton.py [feature] add gptq for inference (#4754) 2023-09-22 11:02:50 +08:00
qkv_matmul_kernel.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
rms_norm.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
rotary_embedding_kernel.py [inference] chatglm2 infer demo (#4724) 2023-09-22 11:12:50 +08:00
self_attention_nofusion.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
softmax.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
token_attention_kernel.py [inference] chatglm2 infer demo (#4724) 2023-09-22 11:12:50 +08:00