yuehuayingxueluo
|
bfff9254ac
|
[inference] Adapted to Rotary Embedding and RMS Norm (#5283)
* adapted to rotary_embedding
* adapted to nopad rms norm
* fix bugs in benchmark
* fix flash_decoding.py
|
2024-01-22 10:55:34 +08:00 |
Yuanheng Zhao
|
6e487e7d3c
|
[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274)
* prevent re-creating intermediate tensors
* add singleton class holding intermediate values
* fix triton kernel api
* add benchmark in pytest
* fix kernel api and add benchmark
* revise flash decoding triton kernel in/out shapes
* fix calling of triton kernel in modeling
* fix pytest: extract to util functions
|
2024-01-19 15:47:16 +08:00 |
Jianghai
|
9e2342bde2
|
[Hotfix] Fix bugs in testing continuous batching (#5270)
* fix bug
* fix bugs
* fix bugs
* fix bugs and add padding
* add funcs and fix bugs
* fix typos
* fix bugs
* add func
|
2024-01-18 16:31:14 +08:00 |
yuehuayingxueluo
|
86b63f720c
|
[Inference]Adapted to the triton attn kernels (#5264)
* adapted to the triton attn kernels
* fix pad input
* adapted to copy_kv_to_blocked_cache
* fix ci test
* update kv memcpy
* remove print
|
2024-01-17 16:03:10 +08:00 |
yuehuayingxueluo
|
2a73e828eb
|
fix bugs related to processing padding mask
|
2024-01-11 13:46:14 +00:00 |
yuehuayingxueluo
|
fa4fbdbffb
|
adapted to pad_context_forward
|
2024-01-11 13:44:06 +00:00 |
yuehuayingxueluo
|
47e53eaa1c
|
fix bugs in attention.py and request_handler.py
|
2024-01-11 13:44:06 +00:00 |
yuehuayingxueluo
|
3ad1f3b78b
|
fix beam_width
|
2024-01-11 13:39:56 +00:00 |
yuehuayingxueluo
|
b2eb9cd186
|
Fixed a typo
|
2024-01-11 13:39:56 +00:00 |
yuehuayingxueluo
|
02c1bf8b2a
|
add context_attention_unpadded
|
2024-01-11 13:39:56 +00:00 |
yuehuayingxueluo
|
9489dc64d8
|
precision alignment
|
2024-01-11 13:39:56 +00:00 |
yuehuayingxueluo
|
62968588d1
|
fix bugs in request_handler
|
2024-01-11 13:39:56 +00:00 |
yuehuayingxueluo
|
62fd08ee44
|
Fixed a bug in the inference frame
|
2024-01-11 13:39:56 +00:00 |
yuehuayingxueluo
|
86853a37d5
|
Add padding llama model
|
2024-01-11 13:39:56 +00:00 |