ColossalAI/colossalai/kernel/cuda_native/csrc
LuGY 6a3f9fda83
[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497)
2022-03-25 14:15:53 +08:00
..
kernels add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
colossal_C_frontend.cpp refactor kernel (#142) 2022-01-13 16:47:17 +08:00
compat.h refactor kernel (#142) 2022-01-13 16:47:17 +08:00
cpu_adam.cpp [zero] cpu adam kernel (#288) 2022-03-11 15:50:28 +08:00
cpu_adam.h [zero] cpu adam kernel (#288) 2022-03-11 15:50:28 +08:00
layer_norm_cuda.cpp add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
layer_norm_cuda_kernel.cu Optimized MoE layer and fixed some bugs; 2022-03-11 15:50:28 +08:00
moe_cuda.cpp Optimized MoE layer and fixed some bugs; 2022-03-11 15:50:28 +08:00
moe_cuda_kernel.cu Optimized MoE layer and fixed some bugs; 2022-03-11 15:50:28 +08:00
multi_tensor_adam.cu [cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497) 2022-03-25 14:15:53 +08:00
multi_tensor_apply.cuh refactor kernel (#142) 2022-01-13 16:47:17 +08:00
multi_tensor_l2norm_kernel.cu refactor kernel (#142) 2022-01-13 16:47:17 +08:00
multi_tensor_lamb.cu refactor kernel (#142) 2022-01-13 16:47:17 +08:00
multi_tensor_scale_kernel.cu refactor kernel (#142) 2022-01-13 16:47:17 +08:00
multi_tensor_sgd_kernel.cu refactor kernel (#142) 2022-01-13 16:47:17 +08:00
multihead_attention_1d.cpp add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
multihead_attention_1d.h add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
scaled_masked_softmax.cpp add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
scaled_masked_softmax.h add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
scaled_masked_softmax_cuda.cu add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
scaled_upper_triang_masked_softmax.cpp add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
scaled_upper_triang_masked_softmax.h add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
scaled_upper_triang_masked_softmax_cuda.cu add colossalai kernel module (#55) 2021-12-21 12:19:52 +08:00
type_shim.h [cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497) 2022-03-25 14:15:53 +08:00