Yuanheng Zhao
7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers ( #5837 )
...
* fix glide llama model
* revise
2024-06-19 15:37:53 +08:00
yuehuayingxueluo
b45000f839
[Inference]Add Streaming LLM ( #5745 )
...
* Add Streaming LLM
* add some parameters to llama_generation.py
* verify streamingllm config
* add test_streamingllm.py
* modified according to the opinions of review
* add Citation
* change _block_tables tolist
2024-06-05 10:51:19 +08:00
Yuanheng Zhao
677cbfacf8
[Fix/Example] Fix Llama Inference Loading Data Type ( #5763 )
...
* [fix/example] fix llama inference loading dtype
* revise loading dtype of benchmark llama3
2024-05-30 13:48:46 +08:00
Yuanheng Zhao
8bcfe360fd
[example] Update Inference Example ( #5725 )
...
* [example] update inference example
2024-05-17 11:28:53 +08:00
Yuanheng Zhao
55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements ( #5688 )
...
* clean requirements
* modify example inference struct
* add test ci scripts
* mark test_infer as submodule
* rm deprecated cls & deps
* import of HAS_FLASH_ATTN
* prune inference tests to be run
* prune triton kernel tests
* increment pytest timeout mins
* revert import path in openmoe
2024-05-08 11:30:15 +08:00