ColossalAI/tests
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455)
* add glide-llama policy and modeling

* update glide modeling, compitable with transformers 4.36.2

* revise glide llama modeling/usage

* fix issues of glimpsing large kv

* revise the way re-loading params for glide drafter

* fix drafter and engine tests

* enable convert to glide strict=False

* revise glide llama modeling

* revise vicuna prompt template

* revise drafter and tests

* apply usage of glide model in engine
2024-04-10 11:07:52 +08:00
..
kit [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_analyzer [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_auto_parallel [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_autochunk [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_booster [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) 2024-03-27 13:57:00 +08:00
test_checkpoint_io [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_cluster [shardformer] Sequence Parallelism Optimization (#5533) 2024-04-03 17:15:47 +08:00
test_config [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_device [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_fx [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_gptq [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_infer [Inference/SpecDec] Support GLIDE Drafter Model (#5455) 2024-04-10 11:07:52 +08:00
test_lazy [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_legacy [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_moe [hotfix] set return_outputs=False in examples and polish code (#5404) 2024-03-25 12:31:09 +08:00
test_optimizer [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_pipeline [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_shardformer [shardformer] Sequence Parallelism Optimization (#5533) 2024-04-03 17:15:47 +08:00
test_smoothquant [inference] Add smmoothquant for llama (#4904) 2023-10-16 11:28:44 +08:00
test_tensor fixed layout converter caching and updated tester 2024-03-26 17:22:27 +08:00
test_zero [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
__init__.py [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00