Wenxuan Tan
8fd25d6e09
[Feature] Split cross-entropy computation in SP ( #5959 )
...
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* update softmax_lse shape by new interface
* change tester name
* remove buffer clone; support packed seq layout
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
* adapt chatglm, command-R, qwen
* debug
* halfway
* fix cross-PP-stage position id length diff bug
* fix typo
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unified cross entropy func for all shardformer models
* remove redundant lines
* add basic ring attn; debug cross entropy
* fwd bwd logic complete
* fwd bwd logic complete; add experimental triton rescale
* precision tests passed
* precision tests passed
* fix typos and remove misc files
* add sp_mode to benchmark; fix varlen interface
* update softmax_lse shape by new interface
* add varlen tests
* fix typo
* all tests passed
* add dkv_group; fix mask
* remove debug statements
* add comments
* q1 index only once
* remove events to simplify stream sync
* simplify forward/backward logic
* 2d ring forward passed
* 2d ring backward passed
* fixes
* fix ring attn loss
* 2D ring backward + llama passed
* merge
* update logger
* fix typo
* rebase
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo
* remove typos
* fixes
* support GPT
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
3 months ago
Hongxin Liu
9664b1bc19
[shardformer] hotfix attn mask ( #5945 )
4 months ago
Edenzzzz
fbf33ecd01
[Feature] Enable PP + SP for llama ( #5868 )
...
* fix cross-PP-stage position id length diff bug
* fix typo
* fix typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* use a one cross entropy func for all shardformer models
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
5 months ago
flybird11111
2ddf624a86
[shardformer] upgrade transformers to 4.39.3 ( #5815 )
...
* [shardformer]upgrade transformers for gpt2/gptj/whisper (#5807 )
* [shardformer] fix modeling of gpt2 and gptj
* [shardformer] fix whisper modeling
* [misc] update requirements
---------
Co-authored-by: ver217 <lhx0217@gmail.com>
* [shardformer]upgrade transformers for mistral (#5808 )
* upgrade transformers for mistral
* fix
* fix
* [shardformer]upgrade transformers for llama (#5809 )
* update transformers
fix
* fix
* fix
* [inference] upgrade transformers (#5810 )
* update transformers
fix
* fix
* fix
* fix
* fix
* [gemini] update transformers for gemini (#5814 )
---------
Co-authored-by: ver217 <lhx0217@gmail.com>
6 months ago
Haze188
22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) ( #5702 )
...
* [pre-commit.ci] auto fixes from pre-commit.com hooks
* add parallel cross entropy output for falcon model & fix some typos in bloom.py
* fix module name error, self.model -> self.transformers in bloom, falcon model
* Fix the overflow bug of distributed cross entropy loss function when training with fp16
* add dtype to parallel cross entropy loss function
* fix dtype related typos adn prettify the loss.py
* fix grad dtype and update dtype mismatch error
* fix typo bugs
6 months ago
wangbluo
4e50cce26b
fix the mistral model
7 months ago
wangbluo
2632916329
remove useless code
7 months ago
wangbluo
9efc79ef24
add parallel output for mistral model
7 months ago
Hongxin Liu
1b387ca9fe
[shardformer] refactor pipeline grad ckpt config ( #5646 )
...
* [shardformer] refactor pipeline grad ckpt config
* [shardformer] refactor pipeline grad ckpt config
* [pipeline] fix stage manager
7 months ago
Wang Binluo
0d0a582033
[shardformer] update transformers ( #5583 )
...
* flash_attention forward upgrade
* llama_model_forward
* remove useless comment
* update the requirements.txt
* add the transformers version requirements
* remove the LATEST VERSION try
* [shardformer] update bloom model (#5518 )
* update bloom model
* remove the version restriction
* [shardformer] update_falcon (#5520 )
* [shardformer] update mistral model (#5511 )
* [shardformer] update gpt2 (#5502 )
* [shardformer] update gptj model (#5503 )
* [shardformer] update opt (#5522 )
* [shardformer] update t5 model (#5524 )
* [shardformer] update whisper model (#5529 )
* [shardformer] update vit model (#5530 )
* update vit model
* remove the output_hidden_states
* [shardformer] fix llama modeling
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [zero] support multiple (partial) backward passes (#5596 )
* [zero] support multiple (partial) backward passes
* [misc] update requirements
* [zero] support multiple (partial) backward passes (#5596 )
* [zero] support multiple (partial) backward passes
* [misc] update requirements
* fix conflicts
* [doc] fix ColossalMoE readme (#5599 )
* fix readme
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* merge with main
* merge with main
* llama_model_forward
* remove useless comment
* remove the LATEST VERSION try
* [shardformer] update bloom model (#5518 )
* update bloom model
* remove the version restriction
* [shardformer] update mistral model (#5511 )
* [shardformer] update opt (#5522 )
* [shardformer] update whisper model (#5529 )
* [shardformer] fix llama modeling
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606 )
* fix no pad token bug
* fixed some auto parallel codegen bug, but might not run on torch 2.1
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
* [shardformer] fix pipeline grad ckpt (#5620 )
* [shardformer] fix pipeline grad ckpt
* [shardformer] fix whisper (#5628 )
* [test] fix llama model test
* fix the opt upgrade (#5634 )
* [shardformer] fix attn replacement (#5636 )
* [shardformer] update flashattention replacement (#5637 )
* update transformers
update transformers
fix
fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [test] fix llama test (#5638 )
* [gemini] fix buffer cast (#5639 )
* Fix shardformer upgrade (#5640 )
* fix llama model
* fix the mistral
* fix the shardformer model
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [shardformer]support pipeline parallelism for mistral. (#5642 )
* [shardformer] fix attn replacement (#5636 )
* [shardformer] update flashattention replacement (#5637 )
* update transformers
update transformers
fix
fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [Feature] Support LLaMA-3 CPT and ST (#5619 )
* support LLaMA-3
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Run pre-commit
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [exampe] update llama example (#5626 )
* [plugin] support dp inside for hybriad parallel
* [example] update llama benchmark
* [example] update llama benchmark
* [example] update llama readme
* [example] update llama readme
* [example] llama3 (#5631 )
* release llama3
* [release] llama3
* [release] llama3
* [release] llama3
* [release] llama3
* [test] fix llama test (#5638 )
* [gemini] fix buffer cast (#5639 )
* support pp for mistral
* fix
* fix
fix
fix
* fix
---------
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
---------
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>
Co-authored-by: Edenzzzz <wenxuan.tan@wisc.edu>
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
7 months ago
Frank Lee
7cfed5f076
[feat] refactored extension module ( #5298 )
...
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
10 months ago
Wenhao Chen
7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert ( #5088 )
...
* [shardformer] implement policy for all GPT-J models and test
* [shardformer] support interleaved pipeline parallel for bert finetune
* [shardformer] shardformer support falcon (#4883 )
* [shardformer]: fix interleaved pipeline for bert model (#5048 )
* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093 )
* Add Mistral support for Shardformer (#5103 )
* [shardformer] add tests to mistral (#5105 )
---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>
1 year ago