ColossalAI

History

Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423 ) * fix flash decoding mask during verification * add spec-dec * add test for spec-dec * revise drafter init * remove drafter sampling * retire past kv in drafter * (trivial) rename attrs * (trivial) rename arg * revise how we enable/disable spec-dec		2024-04-10 11:07:52 +08:00
..
kit	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
test_analyzer	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_auto_parallel	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00
test_autochunk	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_booster	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 )	2024-03-27 13:57:00 +08:00
test_checkpoint_io	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
test_cluster	[shardformer] Sequence Parallelism Optimization (#5533 )	2024-04-03 17:15:47 +08:00
test_config	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_device	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_fx	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_gptq	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
test_infer	[Inference/SpecDec] Add Speculative Decoding Implementation (#5423 )	2024-04-10 11:07:52 +08:00
test_lazy	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
test_legacy	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00
test_moe	[hotfix] set return_outputs=False in examples and polish code (#5404 )	2024-03-25 12:31:09 +08:00
test_optimizer	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
test_pipeline	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
test_shardformer	[shardformer] Sequence Parallelism Optimization (#5533 )	2024-04-03 17:15:47 +08:00
test_smoothquant	[inference] Add smmoothquant for llama (#4904 )	2023-10-16 11:28:44 +08:00
test_tensor	fixed layout converter caching and updated tester	2024-03-26 17:22:27 +08:00
test_zero	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00
__init__.py	[zero] Update sharded model v2 using sharded param v2 (#323 )	2022-03-11 15:50:28 +08:00