ColossalAI

History

Yuanheng Zhao 6e487e7d3c [kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 ) * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions		2024-01-19 15:47:16 +08:00
..
cuda_native	fix thrust-transform-reduce error (#5078 )	2023-11-21 15:09:35 +08:00
jit	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
triton	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
__init__.py	[hotfix] Fix import error: colossal.kernel without triton installed (#4722 )	2023-09-14 18:03:55 +08:00
op_builder	[builder] reconfig op_builder for pypi install (#2314 )	2023-01-04 16:32:32 +08:00