mirror of https://github.com/hpcaitech/ColossalAI
![]() * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <lixx336@gmail.com> |
||
---|---|---|
.. | ||
test_dynamic_batching | ||
_utils.py | ||
test_bloom_infer.py | ||
test_chatglm2_infer.py | ||
test_infer_engine.py | ||
test_kvcache_manager.py | ||
test_llama2_infer.py | ||
test_llama_infer.py | ||
test_pipeline_infer.py |