ColossalAI/colossalai/shardformer/modeling
yuehuayingxueluo f0aab7f9a8
Add Inference test for llama (#4508)
* add kv cache memory manager

* add stateinfo during inference

* add

* add infer example

* finish

* finish

* format

* format

* rename file

* add kv cache test

* revise on BatchInferState

* add inference test for llama

* fix conflict

* feature: add some new features for llama engine

* adapt colossalai triton interface

* Change the parent class of llama  policy

* add nvtx

* move llama inference code to tensor_parallel

* fix __init__.py

* rm tensor_parallel

* fix: fix bugs in auto_policy.py

* fix:rm some unused codes

* mv colossalai/tpinference to colossalai/inference/tensor_parallel

* change __init__.py

* save change

* fix engine

* Bug fix: Fix hang

* remove llama_infer_engine.py

---------

Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
2023-08-30 12:10:26 +08:00
..
chatglm2_6b [pipeline] add chatglm (#4363) 2023-08-15 23:25:14 +08:00
__init__.py [shardformer] added development protocol for standardization (#4149) 2023-07-04 16:05:01 +08:00
bert.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
blip2.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
bloom.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
chatglm.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
gpt2.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
jit.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
llama.py Add Inference test for llama (#4508) 2023-08-30 12:10:26 +08:00
opt.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
sam.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
t5.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
vit.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
whisper.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00