Making large AI models cheaper, faster and more accessible
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057)
1 year ago
..
kernel_utils.py [misc] update pre-commit and run all files (#4752) 1 year ago
test_bloom_context_attention.py [misc] update pre-commit and run all files (#4752) 1 year ago
test_copy_kv_dest.py [misc] update pre-commit and run all files (#4752) 1 year ago
test_layernorm_triton.py [misc] update pre-commit and run all files (#4752) 1 year ago
test_llama_act_combine.py [moe] merge moe into main (#4978) 1 year ago
test_llama_context_attention.py [Kernels]Update triton kernels into 2.1.0 (#5046) 1 year ago
test_self_attention_nonfusion.py [misc] update pre-commit and run all files (#4752) 1 year ago
test_softmax.py [misc] update pre-commit and run all files (#4752) 1 year ago
test_token_attn_fwd.py [inference] Refactor inference architecture (#5057) 1 year ago
test_token_softmax.py [misc] update pre-commit and run all files (#4752) 1 year ago