ColossalAI/extensions/csrc/common
xs_courtesy 388e043930 add implementatino for GetGPULaunchConfig1D 2024-03-14 11:13:40 +08:00
..
cuda_type_utils.h optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441) 2024-03-12 17:48:02 +08:00
micros.h refactor code 2024-03-08 15:41:14 +08:00
mp_type_traits.h refactor code 2024-03-08 15:41:14 +08:00
target.h add implementatino for GetGPULaunchConfig1D 2024-03-14 11:13:40 +08:00
vector_copy_utils.h [Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 2024-03-13 17:20:03 +08:00