Edenzzzz
|
f5c84af0b0
|
[Feature] Zigzag Ring attention (#5905)
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add sp_mode to benchmark; fix varlen interface
* update softmax_lse shape by new interface
* change tester name
* remove buffer clone; support packed seq layout
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2024-08-16 13:56:38 +08:00 |
haze188
|
70793ce9ed
|
[misc] fix ci failure: change default value to false in moe plugin
|
2024-08-01 10:06:59 +08:00 |
hxwang
|
cb01c0d5ce
|
[moe] refactor mesh assignment
|
2024-08-01 10:06:59 +08:00 |
haze188
|
034020bd04
|
[misc] remove debug/print code
|
2024-08-01 10:06:59 +08:00 |
hxwang
|
c3dc9b4dba
|
[deepseek] replace attn (a workaround for bug in transformers)
|
2024-08-01 10:06:59 +08:00 |
haze188
|
b2952a5982
|
[moe] deepseek moe sp support
|
2024-08-01 10:06:59 +08:00 |
hxwang
|
803878b2fd
|
[moe] full test for deepseek and mixtral (pp + sp to fix)
|
2024-08-01 10:06:59 +08:00 |
hxwang
|
74eccac0db
|
[moe] test deepseek
|
2024-08-01 10:06:59 +08:00 |
Haze188
|
3420921101
|
[shardformer] DeepseekMoE support (#5871)
* [Feature] deepseek moe expert parallel implement
* [misc] fix typo, remove redundant file (#5867)
* [misc] fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [Feature] deepseek support & unit test
* [misc] remove debug code & useless print
* [misc] fix typos (#5872)
* [Feature] remove modeling file, use auto config. (#5884)
* [misc] fix typos
* [Feature] deepseek support via auto model, remove modeling file
* [misc] delete useless file
* [misc] fix typos
* [Deepseek] remove redundant code (#5888)
* [misc] fix typos
* [Feature] deepseek support via auto model, remove modeling file
* [misc] delete useless file
* [misc] fix typos
* [misc] remove redundant code
* [Feature/deepseek] resolve comment. (#5889)
* [misc] fix typos
* [Feature] deepseek support via auto model, remove modeling file
* [misc] delete useless file
* [misc] fix typos
* [misc] remove redundant code
* [misc] mv module replacement into if branch
* [misc] add some warning message and modify some code in unit test
* [misc] fix typos
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2024-07-05 16:13:58 +08:00 |