mirror of https://github.com/hpcaitech/ColossalAI
![]() * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> |
||
---|---|---|
.. | ||
chatglm2_6b | ||
__init__.py | ||
bert.py | ||
blip2.py | ||
bloom.py | ||
chatglm.py | ||
gpt2.py | ||
jit.py | ||
llama.py | ||
opt.py | ||
sam.py | ||
t5.py | ||
vit.py | ||
whisper.py |