Commit Graph

3763 Commits (5caad13055e802e2665f1d70593116103a72395a)

Author SHA1 Message Date
梁爽 6b2c506fc5
Update README.md (#6087)
add HPC-AI.COM activity
2024-10-10 17:02:49 +08:00
wangbluo 5ecc27e150 fix 2024-10-10 15:35:52 +08:00
wangbluo f98384aef6 fix 2024-10-10 15:17:06 +08:00
Hongxin Liu 646b3c5a90
[shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084)
* [tp] hotfix linear row

* [tp] support uneven split for fused linear

* [tp] support sp for fused linear

* [tp] fix gpt2 mlp policy

* [tp] fix gather fused and add fused linear row
2024-10-10 14:34:45 +08:00
wangbluo b635dd0669 fix 2024-10-09 14:05:26 +08:00
wangbluo 3532f77b90 fix 2024-10-09 10:57:19 +08:00
wangbluo 3fab92166e fix 2024-09-26 18:03:09 +08:00
binmakeswell f4daf04270
add funding news (#6072)
* add funding news

* add funding news

* add funding news
2024-09-26 12:29:27 +08:00
wangbluo 6705dad41b fix 2024-09-25 19:02:21 +08:00
wangbluo 91ed32c256 fix 2024-09-25 19:00:38 +08:00
wangbluo 6fb1322db1 fix 2024-09-25 18:56:18 +08:00
wangbluo 65c8297710 fix the attn 2024-09-25 18:51:03 +08:00
wangbluo cfd9eda628 fix the ring attn 2024-09-25 18:34:29 +08:00
binmakeswell cbaa104216
release FP8 news (#6068)
* add FP8 news

* release FP8 news

* release FP8 news
2024-09-25 11:57:16 +08:00
Hongxin Liu dabc2e7430
[release] update version (#6062) 2024-09-19 10:45:32 +08:00
Camille Zhong f9546ba0be
[ColossalEval] support for vllm (#6056)
* support vllm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify vllm and update readme

* run pre-commit

* remove dupilicated lines and refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update param name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine code

* update readme

* refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-18 17:09:45 +08:00
botbw 4fa6b9509c
[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) 2024-09-18 10:09:01 +08:00
Wang Binluo 63314ce4e4
Merge pull request #6064 from wangbluo/fix_attn
[sp] : fix the attention kernel for sp
2024-09-18 10:08:15 +08:00
wangbluo 10e4f7da72 fix 2024-09-16 13:45:04 +08:00
Wang Binluo 37e35230ff
Merge pull request #6061 from wangbluo/sp_fix
[sp] : fix the attention kernel for sp
2024-09-14 20:54:35 +08:00
wangbluo 827ef3ee9a fix 2024-09-14 10:40:35 +00:00
Guangyao Zhang bdb125f83f
[doc] FP8 training and communication document (#6050)
* Add FP8 training and communication document

* add fp8 docstring for plugins

* fix typo

* fix typo
2024-09-14 11:01:05 +08:00
Guangyao Zhang f20b066c59
[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059)
* all_gather only internode, fix pytest

* fix cuda arch <89 compile pytest error

* fix pytest failure

* disable all_gather_into_tensor_flat_fp8

* fix fp8 format

* fix pytest

* fix conversations

* fix chunk tuple to list
2024-09-14 10:40:01 +08:00
wangbluo b582319273 fix 2024-09-13 10:24:41 +00:00
wangbluo 0ad3129cb9 fix 2024-09-13 09:01:26 +00:00
wangbluo 0b14a5512e fix 2024-09-13 07:06:14 +00:00
botbw 696fced0d7
[fp8] fix missing fp8_comm flag in mixtral (#6057) 2024-09-13 14:30:05 +08:00
wangbluo dc032172c3 fix 2024-09-13 06:00:58 +00:00
wangbluo f393867cff fix 2024-09-13 05:24:52 +00:00
wangbluo 6eb8832366 fix 2024-09-13 05:06:56 +00:00
wangbluo 683179cefd fix 2024-09-13 03:40:56 +00:00
wangbluo 0a01e2a453 fix the attn 2024-09-13 03:38:35 +00:00
pre-commit-ci[bot] 216d54e374 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-09-13 02:38:40 +00:00
wangbluo fdd84b9087 fix the sp 2024-09-13 02:32:03 +00:00
flybird11111 a35a078f08
[doc] update sp doc (#6055)
* update sp doc

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-11 17:25:14 +08:00
Hongxin Liu 13946c4448
[fp8] hotfix backward hook (#6053)
* [fp8] hotfix backward hook

* [fp8] hotfix pipeline loss accumulation
2024-09-11 16:11:25 +08:00
botbw c54c4fcd15
[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048)
* [example] pass use_fp8_comm flag to all plugins

* [example] add mixtral benchmark

* [moe] refine assertion and check

* [moe] fix mixtral & add more tests

* [moe] consider checking dp * sp group and moe_dp_group

* [mixtral] remove gate tp & add more tests

* [deepseek] fix tp & sp for deepseek

* [mixtral] minor fix

* [deepseek] add deepseek benchmark
2024-09-10 17:30:53 +08:00
Wenxuan Tan 8fd25d6e09
[Feature] Split cross-entropy computation in SP (#5959)
* halfway

* fix cross-PP-stage position id length diff bug

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unified cross entropy func for all shardformer models

* remove redundant lines

* add basic ring attn; debug cross entropy

* fwd bwd logic complete

* fwd bwd logic complete; add experimental triton rescale

* precision tests passed

* precision tests passed

* fix typos and remove misc files

* update softmax_lse shape by new interface

* change tester name

* remove buffer clone; support packed seq layout

* add varlen tests

* fix typo

* all tests passed

* add dkv_group; fix mask

* remove debug statements

* adapt chatglm, command-R, qwen

* debug

* halfway

* fix cross-PP-stage position id length diff bug

* fix typo

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unified cross entropy func for all shardformer models

* remove redundant lines

* add basic ring attn; debug cross entropy

* fwd bwd logic complete

* fwd bwd logic complete; add experimental triton rescale

* precision tests passed

* precision tests passed

* fix typos and remove misc files

* add sp_mode to benchmark; fix varlen interface

* update softmax_lse shape by new interface

* add varlen tests

* fix typo

* all tests passed

* add dkv_group; fix mask

* remove debug statements

* add comments

* q1 index only once

* remove events to simplify stream sync

* simplify forward/backward logic

* 2d ring forward passed

* 2d ring backward passed

* fixes

* fix ring attn loss

* 2D ring backward + llama passed

* merge

* update logger

* fix typo

* rebase

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* remove typos

* fixes

* support GPT

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-10 12:06:50 +08:00
Hongxin Liu b3db1058ec
[release] update version (#6041)
* [release] update version

* [devops] update comp test

* [devops] update comp test debug

* [devops] debug comp test

* [devops] debug comp test

* [devops] debug comp test

* [devops] debug comp test

* [devops] debug comp test
2024-09-10 10:31:09 +08:00
Hanks 5ce6dd75bf
[fp8] disable all_to_all_fp8 in intranode (#6045)
* enhance all_to_all_fp8 with internode comm control

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable some fp8 ops due to performance issue

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-09 13:47:17 +08:00
Hongxin Liu 26e553937b
[fp8] fix linear hook (#6046) 2024-09-03 16:37:16 +08:00
Hongxin Liu c3b5caff0e
[fp8] optimize all-gather (#6043)
* [fp8] optimize all-gather

* [fp8] fix all gather fp8 ring

* [fp8] enable compile

* [fp8] fix all gather fp8 ring
2024-09-03 15:45:17 +08:00
Tong Li c650a906db
[Hotfix] Remove deprecated install (#6042)
* remove deprecated install

* remove unused folder
2024-09-03 10:33:18 +08:00
Gao, Ruiyuan e9032fb0b2
[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020)
* fix bug in load_state_dict_into_model; format error msg

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update utils.py

to support checking missing_keys

* Update general_checkpoint_io.py

fix bug in missing_keys error message

* retrigger tests

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-02 16:56:35 +08:00
Guangyao Zhang e96a0761ea
[FP8] unsqueeze scale to make it compatible with torch.compile (#6040) 2024-08-29 14:49:23 +08:00
Tong Li 0d3a85d04f
add fused norm (#6038) 2024-08-28 17:12:51 +08:00
Tong Li 4a68efb7da
[Colossal-LLaMA] Refactor latest APIs (#6030)
* refactor latest code

* update api

* add dummy dataset

* update Readme

* add setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update files

* add PP support

* update arguments

* update argument

* reorg folder

* update version

* remove IB infor

* update utils

* update readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update save for zero

* update save

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add apex

* update

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-28 17:01:58 +08:00
Hongxin Liu cc1b0efc17
[plugin] hotfix zero plugin (#6036)
* [plugin] hotfix zero plugin

* [plugin] hotfix zero plugin
2024-08-28 10:16:48 +08:00
Wenxuan Tan d383449fc4
[CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018)
* remove triton version

* remove torch 2.2

* remove torch 2.1

* debug

* remove 2.1 build tests

* require torch >=2.2

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-08-27 10:12:21 +08:00
Hongxin Liu 17904cb5bf
Merge pull request #6012 from hpcaitech/feature/fp8_comm
[fp8]  support fp8 communication and fp8 training for Colossalai
2024-08-27 10:09:43 +08:00