duanjunwen
c2fe3137e2
[hotfix] fix flash attn window_size err ( #6132 )
...
* [fix] fix flash attn
* [hotfix] fix flash-atten version
* [fix] fix flash_atten version
* [fix] fix flash-atten versions
* [fix] fix flash-attn not enough values to unpack error
* [fix] fix test_ring_attn
* [fix] fix test ring attn
2024-11-14 17:11:35 +08:00
Wenxuan Tan
62c13e7969
[Ring Attention] Improve comments ( #6085 )
...
* improve comments
* improve comments
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-10-16 11:23:35 +08:00
wangbluo
83cf2f84fb
fix
2024-10-15 14:50:27 +08:00
wangbluo
fd92789af2
fix
2024-10-15 13:26:44 +08:00
wangbluo
6be9862aaf
fix
2024-10-15 11:56:49 +08:00
wangbluo
3dc08c8a5a
fix
2024-10-15 11:01:34 +08:00
wangbluo
fe9208feac
fix
2024-10-14 18:07:56 +08:00
wangbluo
23199e34cc
fix
2024-10-14 18:01:53 +08:00
wangbluo
d891e50617
fix
2024-10-14 14:56:05 +08:00
wangbluo
e1e86f9f1f
fix
2024-10-14 11:45:35 +08:00
wangbluo
1507a7528f
fix
2024-10-11 06:20:34 +00:00
wangbluo
0002ae5956
fix
2024-10-11 14:16:21 +08:00
wangbluo
efe3042bb2
fix
2024-10-10 18:38:47 +08:00
wangbluo
5ecc27e150
fix
2024-10-10 15:35:52 +08:00
wangbluo
f98384aef6
fix
2024-10-10 15:17:06 +08:00
wangbluo
b635dd0669
fix
2024-10-09 14:05:26 +08:00
wangbluo
3532f77b90
fix
2024-10-09 10:57:19 +08:00
wangbluo
3fab92166e
fix
2024-09-26 18:03:09 +08:00
wangbluo
6705dad41b
fix
2024-09-25 19:02:21 +08:00
wangbluo
91ed32c256
fix
2024-09-25 19:00:38 +08:00
wangbluo
6fb1322db1
fix
2024-09-25 18:56:18 +08:00
wangbluo
65c8297710
fix the attn
2024-09-25 18:51:03 +08:00
wangbluo
cfd9eda628
fix the ring attn
2024-09-25 18:34:29 +08:00
wangbluo
10e4f7da72
fix
2024-09-16 13:45:04 +08:00
wangbluo
827ef3ee9a
fix
2024-09-14 10:40:35 +00:00
wangbluo
b582319273
fix
2024-09-13 10:24:41 +00:00
wangbluo
0ad3129cb9
fix
2024-09-13 09:01:26 +00:00
wangbluo
0b14a5512e
fix
2024-09-13 07:06:14 +00:00
wangbluo
dc032172c3
fix
2024-09-13 06:00:58 +00:00
wangbluo
f393867cff
fix
2024-09-13 05:24:52 +00:00
wangbluo
6eb8832366
fix
2024-09-13 05:06:56 +00:00
wangbluo
683179cefd
fix
2024-09-13 03:40:56 +00:00
wangbluo
0a01e2a453
fix the attn
2024-09-13 03:38:35 +00:00
pre-commit-ci[bot]
216d54e374
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2024-09-13 02:38:40 +00:00
wangbluo
fdd84b9087
fix the sp
2024-09-13 02:32:03 +00:00
Wenxuan Tan
8fd25d6e09
[Feature] Split cross-entropy computation in SP ( #5959 )
...
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* update softmax_lse shape by new interface
* change tester name
* remove buffer clone; support packed seq layout
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
* adapt chatglm, command-R, qwen
* debug
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* add sp_mode to benchmark; fix varlen interface
* update softmax_lse shape by new interface
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
* add comments
* q1 index only once
* remove events to simplify stream sync
* simplify forward/backward logic
* 2d ring forward passed
* 2d ring backward passed
* fixes
* fix ring attn loss
* 2D ring backward + llama passed
* merge
* update logger
* fix typo
* rebase
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo
* remove typos
* fixes
* support GPT
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-10 12:06:50 +08:00
Edenzzzz
f1c3266a94
overlap kv comm with output rescale ( #6017 )
...
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-08-19 14:08:17 +08:00
Edenzzzz
f5c84af0b0
[Feature] Zigzag Ring attention ( #5905 )
...
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add sp_mode to benchmark; fix varlen interface
* update softmax_lse shape by new interface
* change tester name
* remove buffer clone; support packed seq layout
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-16 13:56:38 +08:00
Hongxin Liu
7b38964e3a
[shardformer] hotfix attn mask ( #5947 )
2024-07-29 19:10:06 +08:00
Li Xingjian
8554585a5f
[Inference] Fix flash-attn import and add model test ( #5794 )
...
* Fix torch int32 dtype
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Fix flash-attn import
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add generalized model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Remove exposed path to model
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add default value for use_flash_attn
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Rename model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
---------
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-12 14:13:50 +08:00
flybird11111
148506c828
[coloattention]modify coloattention ( #5627 )
...
* modify coloattention
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* fix
* fix
fxi
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 10:47:14 +08:00
Hongxin Liu
19e1a5cf16
[shardformer] update colo attention to support custom mask ( #5510 )
...
* [feature] refactor colo attention (#5462 )
* [extension] update api
* [feature] add colo attention
* [feature] update sdpa
* [feature] update npu attention
* [feature] update flash-attn
* [test] add flash attn test
* [test] update flash attn test
* [shardformer] update modeling to fit colo attention (#5465 )
* [misc] refactor folder structure
* [shardformer] update llama flash-attn
* [shardformer] fix llama policy
* [devops] update tensornvme install
* [test] update llama test
* [shardformer] update colo attn kernel dispatch
* [shardformer] update blip2
* [shardformer] update chatglm
* [shardformer] update gpt2
* [shardformer] update gptj
* [shardformer] update opt
* [shardformer] update vit
* [shardformer] update colo attention mask prep
* [shardformer] update whisper
* [test] fix shardformer tests (#5514 )
* [test] fix shardformer tests
* [test] fix shardformer tests
2024-03-27 11:19:32 +08:00