ColossalAI/tests
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
* fix flash decoding mask during verification

* add spec-dec

* add test for spec-dec

* revise drafter init

* remove drafter sampling

* retire past kv in drafter

* (trivial) rename attrs

* (trivial) rename arg

* revise how we enable/disable spec-dec
2024-04-10 11:07:52 +08:00
..
kit [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_analyzer [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_auto_parallel [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_autochunk [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_booster [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) 2024-03-27 13:57:00 +08:00
test_checkpoint_io [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_cluster [shardformer] Sequence Parallelism Optimization (#5533) 2024-04-03 17:15:47 +08:00
test_config [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_device [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_fx [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
test_gptq [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_infer [Inference/SpecDec] Add Speculative Decoding Implementation (#5423) 2024-04-10 11:07:52 +08:00
test_lazy [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_legacy [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
test_moe [hotfix] set return_outputs=False in examples and polish code (#5404) 2024-03-25 12:31:09 +08:00
test_optimizer [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_pipeline [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
test_shardformer [shardformer] Sequence Parallelism Optimization (#5533) 2024-04-03 17:15:47 +08:00
test_smoothquant [inference] Add smmoothquant for llama (#4904) 2023-10-16 11:28:44 +08:00
test_tensor fixed layout converter caching and updated tester 2024-03-26 17:22:27 +08:00
test_zero [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
__init__.py [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00