Commit Graph

3874 Commits (dafda0fb7082506ad76b5deff3024b3d5dbb904b)

Author SHA1 Message Date
wangbluo 3fab92166e fix 2024-09-26 18:03:09 +08:00
duanjunwen bb0390c90d [fix] remove duplicate arg; rm comments; 2024-09-26 09:45:44 +00:00
duanjunwen c5503b0d80 [fix] fix test_pipeline_utils ci; 2024-09-26 07:18:16 +00:00
duanjunwen 45f17fc6cc [fix] rm comments; 2024-09-26 06:13:56 +00:00
duanjunwen a92e16719b [fix] fix zerobubble; support shardformer model type; 2024-09-26 06:11:56 +00:00
binmakeswell f4daf04270
add funding news (#6072)
* add funding news

* add funding news

* add funding news
2024-09-26 12:29:27 +08:00
wangbluo 6705dad41b fix 2024-09-25 19:02:21 +08:00
wangbluo 91ed32c256 fix 2024-09-25 19:00:38 +08:00
wangbluo 6fb1322db1 fix 2024-09-25 18:56:18 +08:00
wangbluo 65c8297710 fix the attn 2024-09-25 18:51:03 +08:00
wangbluo cfd9eda628 fix the ring attn 2024-09-25 18:34:29 +08:00
duanjunwen 83163fa70c [fix] fix traverse; traverse dict --> traverse tensor List; 2024-09-25 06:38:11 +00:00
duanjunwen fc8b016887 [fix] fix stage_indices; 2024-09-25 06:15:45 +00:00
binmakeswell cbaa104216
release FP8 news (#6068)
* add FP8 news

* release FP8 news

* release FP8 news
2024-09-25 11:57:16 +08:00
duanjunwen 8501202a35
Merge pull request #6065 from duanjunwen/dev/zero_bubble
[Feat] Support zero bubble with shardformer input
2024-09-24 19:17:37 +08:00
duanjunwen 7e6f793c51 [fix] fix detach_output_obj clone; 2024-09-24 08:08:32 +00:00
duanjunwen 6c1e1550ae [fix] fix dumb clone; 2024-09-23 06:43:49 +00:00
duanjunwen a875212a42 [fix] fix ci --> oom in 4096 hidden dim; 2024-09-23 05:55:16 +00:00
duanjunwen c114d1429a [fix] fix detach clone release order; 2024-09-23 04:00:24 +00:00
duanjunwen da3220f48c [fix] fix pipeline util func deallocate --> release_tensor_data; fix bwd_b loss bwd branch; 2024-09-20 09:48:35 +00:00
duanjunwen 1739df423c [fix] fix fwd branch, fwd pass both micro_batch & internal_inputs' 2024-09-20 07:34:43 +00:00
duanjunwen b6616f544e [fix] rm comments; 2024-09-20 07:29:41 +00:00
duanjunwen c6d6ee39bd [fix] use tree_flatten replace dict traverse; 2024-09-20 07:18:49 +00:00
duanjunwen 26783776f1 [fix] fix input_tensors buffer append input_obj(dict) --> Tuple (microbatch, input_obj) , and all bwd b related cal logic; 2024-09-20 06:41:19 +00:00
duanjunwen 4753bf7add [fix] fix mem assert; 2024-09-19 08:27:47 +00:00
duanjunwen a115106f8d [fix] fix bwd w input; 2024-09-19 08:10:05 +00:00
duanjunwen 349272c71f [fix] updatw bwd b&w input; dict --> list[torch.Tensor] 2024-09-19 07:47:01 +00:00
duanjunwen 6ee9584b9a [fix] fix require_grad & deallocate call; 2024-09-19 05:53:03 +00:00
duanjunwen 1f5c7258aa Merge remote-tracking branch 'upstream/feature/zerobubble' into dev/zero_bubble 2024-09-19 03:52:13 +00:00
Hongxin Liu dabc2e7430
[release] update version (#6062) 2024-09-19 10:45:32 +08:00
Camille Zhong f9546ba0be
[ColossalEval] support for vllm (#6056)
* support vllm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify vllm and update readme

* run pre-commit

* remove dupilicated lines and refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update param name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine code

* update readme

* refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-18 17:09:45 +08:00
duanjunwen af2c2f8092 [feat] add more test; 2024-09-18 07:51:54 +00:00
duanjunwen 3dbad102cf [fix] fix zerobubble pp for shardformer type input; 2024-09-18 07:14:34 +00:00
botbw 4fa6b9509c
[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) 2024-09-18 10:09:01 +08:00
Wang Binluo 63314ce4e4
Merge pull request #6064 from wangbluo/fix_attn
[sp] : fix the attention kernel for sp
2024-09-18 10:08:15 +08:00
wangbluo 10e4f7da72 fix 2024-09-16 13:45:04 +08:00
Wang Binluo 37e35230ff
Merge pull request #6061 from wangbluo/sp_fix
[sp] : fix the attention kernel for sp
2024-09-14 20:54:35 +08:00
wangbluo 827ef3ee9a fix 2024-09-14 10:40:35 +00:00
Guangyao Zhang bdb125f83f
[doc] FP8 training and communication document (#6050)
* Add FP8 training and communication document

* add fp8 docstring for plugins

* fix typo

* fix typo
2024-09-14 11:01:05 +08:00
Guangyao Zhang f20b066c59
[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059)
* all_gather only internode, fix pytest

* fix cuda arch <89 compile pytest error

* fix pytest failure

* disable all_gather_into_tensor_flat_fp8

* fix fp8 format

* fix pytest

* fix conversations

* fix chunk tuple to list
2024-09-14 10:40:01 +08:00
wangbluo b582319273 fix 2024-09-13 10:24:41 +00:00
wangbluo 0ad3129cb9 fix 2024-09-13 09:01:26 +00:00
wangbluo 0b14a5512e fix 2024-09-13 07:06:14 +00:00
botbw 696fced0d7
[fp8] fix missing fp8_comm flag in mixtral (#6057) 2024-09-13 14:30:05 +08:00
wangbluo dc032172c3 fix 2024-09-13 06:00:58 +00:00
wangbluo f393867cff fix 2024-09-13 05:24:52 +00:00
wangbluo 6eb8832366 fix 2024-09-13 05:06:56 +00:00
wangbluo 683179cefd fix 2024-09-13 03:40:56 +00:00
wangbluo 0a01e2a453 fix the attn 2024-09-13 03:38:35 +00:00
pre-commit-ci[bot] 216d54e374 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-09-13 02:38:40 +00:00