Commit Graph

8 Commits (a37f82629d7b9e3c3a0f430b8dd3ff6f38ddf1d4)

Author SHA1 Message Date
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383)
* revise rotary embedding

* remove useless print

* adapt

* fix

* add

* fix

* modeling

* fix

* fix

* fix

* fused kv copy

* fused copy

* colossalai/kernel/triton/no_pad_rotary_embedding.py

* del padding llama

* del
2024-02-21 11:31:48 +08:00
Frank Lee 8106ede07f
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373)
This reverts commit 9f4ab2eb92.
2024-02-07 14:27:04 +08:00
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348)
* revise rotary embedding

* remove useless print

* adapt

* fix

* add

* fix

* modeling

* fix

* fix

* fix
2024-02-07 11:36:04 +08:00
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)
* fused the gate and up proj in mlp

* fix code styles

* opt auto_grad

* rollback test_inference_engine.py

* modifications based on the review feedback.

* fix bugs in flash attn

* Change reshape to view

* fix test_rmsnorm_triton.py
2024-02-06 19:38:25 +08:00
Jianghai df0aa49585
[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336)
* revise rotary embedding

* remove useless print

* adapt
2024-01-31 16:31:29 +08:00
Jianghai 7ddd8b37f0
fix (#5311) 2024-01-26 15:02:12 +08:00
Jianghai c647e00e3c
[Inference]Add fused rotary kernel and get cos cache kernel (#5302)
* add fused rotary and get cos cache func

* staged

* fix bugs

* fix bugs
2024-01-24 16:20:42 +08:00
Jianghai fded91d049 [Inference] Kernel: no pad rotary embedding (#5252)
* fix bugs

* comment

* use more accurate atol

* fix
2024-01-11 13:46:14 +00:00