duanjunwen
c2fe3137e2
[hotfix] fix flash attn window_size err ( #6132 )
...
* [fix] fix flash attn
* [hotfix] fix flash-atten version
* [fix] fix flash_atten version
* [fix] fix flash-atten versions
* [fix] fix flash-attn not enough values to unpack error
* [fix] fix test_ring_attn
* [fix] fix test ring attn
2024-11-14 17:11:35 +08:00
Wenxuan Tan
62c13e7969
[Ring Attention] Improve comments ( #6085 )
...
* improve comments
* improve comments
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-10-16 11:23:35 +08:00
Wang Binluo
dcd41d0973
Merge pull request #6071 from wangbluo/ring_attention
...
[Ring Attention] fix the 2d ring attn when using multiple machine
2024-10-15 15:17:21 +08:00
wangbluo
83cf2f84fb
fix
2024-10-15 14:50:27 +08:00
wangbluo
bc7eeade33
fix
2024-10-15 13:28:33 +08:00
wangbluo
fd92789af2
fix
2024-10-15 13:26:44 +08:00
wangbluo
6be9862aaf
fix
2024-10-15 11:56:49 +08:00
wangbluo
3dc08c8a5a
fix
2024-10-15 11:01:34 +08:00
wangbluo
fe9208feac
fix
2024-10-14 18:07:56 +08:00
wangbluo
3201377e94
fix
2024-10-14 18:06:24 +08:00
wangbluo
23199e34cc
fix
2024-10-14 18:01:53 +08:00
wangbluo
d891e50617
fix
2024-10-14 14:56:05 +08:00
wangbluo
e1e86f9f1f
fix
2024-10-14 11:45:35 +08:00
Tong Li
4c8e85ee0d
[Coati] Train DPO using PP ( #6054 )
...
* update dpo
* remove unsupport plugin
* update msg
* update dpo
* remove unsupport plugin
* update msg
* update template
* update dataset
* add pp for dpo
* update dpo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add dpo fn
* update dpo
* update dpo
* update dpo
* update dpo
* minor update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update loss
* update help
* polish code
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-10-11 19:32:00 +08:00
wangbluo
1507a7528f
fix
2024-10-11 06:20:34 +00:00
wangbluo
0002ae5956
fix
2024-10-11 14:16:21 +08:00
Hongxin Liu
dc2cdaf3e8
[shardformer] optimize seq parallelism ( #6086 )
...
* [shardformer] optimize seq parallelism
* [shardformer] fix gpt2 fused linear col
* [plugin] update gemini plugin
* [plugin] update moe hybrid plugin
* [test] update gpt2 fused linear test
* [shardformer] fix gpt2 fused linear reduce
2024-10-11 13:44:40 +08:00
wangbluo
efe3042bb2
fix
2024-10-10 18:38:47 +08:00
wangbluo
5ecc27e150
fix
2024-10-10 15:35:52 +08:00
wangbluo
f98384aef6
fix
2024-10-10 15:17:06 +08:00
Hongxin Liu
646b3c5a90
[shardformer] fix linear 1d row and support uneven splits for fused qkv linear ( #6084 )
...
* [tp] hotfix linear row
* [tp] support uneven split for fused linear
* [tp] support sp for fused linear
* [tp] fix gpt2 mlp policy
* [tp] fix gather fused and add fused linear row
2024-10-10 14:34:45 +08:00
wangbluo
b635dd0669
fix
2024-10-09 14:05:26 +08:00
wangbluo
3532f77b90
fix
2024-10-09 10:57:19 +08:00
wangbluo
3fab92166e
fix
2024-09-26 18:03:09 +08:00
wangbluo
6705dad41b
fix
2024-09-25 19:02:21 +08:00
wangbluo
91ed32c256
fix
2024-09-25 19:00:38 +08:00
wangbluo
6fb1322db1
fix
2024-09-25 18:56:18 +08:00
wangbluo
65c8297710
fix the attn
2024-09-25 18:51:03 +08:00
wangbluo
cfd9eda628
fix the ring attn
2024-09-25 18:34:29 +08:00
botbw
4fa6b9509c
[moe] add parallel strategy for shared_expert && fix test for deepseek ( #6063 )
2024-09-18 10:09:01 +08:00
wangbluo
10e4f7da72
fix
2024-09-16 13:45:04 +08:00
Wang Binluo
37e35230ff
Merge pull request #6061 from wangbluo/sp_fix
...
[sp] : fix the attention kernel for sp
2024-09-14 20:54:35 +08:00
wangbluo
827ef3ee9a
fix
2024-09-14 10:40:35 +00:00
Guangyao Zhang
f20b066c59
[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 ( #6059 )
...
* all_gather only internode, fix pytest
* fix cuda arch <89 compile pytest error
* fix pytest failure
* disable all_gather_into_tensor_flat_fp8
* fix fp8 format
* fix pytest
* fix conversations
* fix chunk tuple to list
2024-09-14 10:40:01 +08:00
wangbluo
b582319273
fix
2024-09-13 10:24:41 +00:00
wangbluo
0ad3129cb9
fix
2024-09-13 09:01:26 +00:00
wangbluo
0b14a5512e
fix
2024-09-13 07:06:14 +00:00
botbw
696fced0d7
[fp8] fix missing fp8_comm flag in mixtral ( #6057 )
2024-09-13 14:30:05 +08:00
wangbluo
dc032172c3
fix
2024-09-13 06:00:58 +00:00
wangbluo
f393867cff
fix
2024-09-13 05:24:52 +00:00
wangbluo
6eb8832366
fix
2024-09-13 05:06:56 +00:00
wangbluo
683179cefd
fix
2024-09-13 03:40:56 +00:00
wangbluo
0a01e2a453
fix the attn
2024-09-13 03:38:35 +00:00
pre-commit-ci[bot]
216d54e374
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2024-09-13 02:38:40 +00:00
wangbluo
fdd84b9087
fix the sp
2024-09-13 02:32:03 +00:00
botbw
c54c4fcd15
[hotfix] moe hybrid parallelism benchmark & follow-up fix ( #6048 )
...
* [example] pass use_fp8_comm flag to all plugins
* [example] add mixtral benchmark
* [moe] refine assertion and check
* [moe] fix mixtral & add more tests
* [moe] consider checking dp * sp group and moe_dp_group
* [mixtral] remove gate tp & add more tests
* [deepseek] fix tp & sp for deepseek
* [mixtral] minor fix
* [deepseek] add deepseek benchmark
2024-09-10 17:30:53 +08:00
Wenxuan Tan
8fd25d6e09
[Feature] Split cross-entropy computation in SP ( #5959 )
...
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* update softmax_lse shape by new interface
* change tester name
* remove buffer clone; support packed seq layout
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
* adapt chatglm, command-R, qwen
* debug
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* add sp_mode to benchmark; fix varlen interface
* update softmax_lse shape by new interface
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
* add comments
* q1 index only once
* remove events to simplify stream sync
* simplify forward/backward logic
* 2d ring forward passed
* 2d ring backward passed
* fixes
* fix ring attn loss
* 2D ring backward + llama passed
* merge
* update logger
* fix typo
* rebase
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo
* remove typos
* fixes
* support GPT
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-10 12:06:50 +08:00
Hongxin Liu
17904cb5bf
Merge pull request #6012 from hpcaitech/feature/fp8_comm
...
[fp8] support fp8 communication and fp8 training for Colossalai
2024-08-27 10:09:43 +08:00
wangbluo
dae39999d7
fix
2024-08-26 03:45:42 +00:00
Wenxuan Tan
7cf9df07bc
[Hotfix] Fix llama fwd replacement bug ( #6031 )
...
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-08-23 15:44:27 +08:00