Frank Lee
|
efef43b53c
|
Merge pull request #5372 from hpcaitech/exp/mixtral
|
10 months ago |
Frank Lee
|
4c03347fc7
|
Merge pull request #5377 from hpcaitech/example/llama-npu
[llama] support npu for Colossal-LLaMA-2
|
10 months ago |
ver217
|
06db94fbc9
|
[moe] fix tests
|
10 months ago |
Hongxin Liu
|
65e5d6baa5
|
[moe] fix mixtral optim checkpoint (#5344)
|
10 months ago |
Hongxin Liu
|
956b561b54
|
[moe] fix mixtral forward default value (#5329)
|
10 months ago |
Hongxin Liu
|
b60be18dcc
|
[moe] fix mixtral checkpoint io (#5314)
|
10 months ago |
Hongxin Liu
|
da39d21b71
|
[moe] support mixtral (#5309)
* [moe] add mixtral block for single expert
* [moe] mixtral block fwd support uneven ep
* [moe] mixtral block bwd support uneven ep
* [moe] add mixtral moe layer
* [moe] simplify replace
* [meo] support save sharded mixtral
* [meo] support load sharded mixtral
* [meo] support save sharded optim
* [meo] integrate moe manager into plug
* [meo] fix optimizer load
* [meo] fix mixtral layer
|
10 months ago |
Hongxin Liu
|
c904d2ae99
|
[moe] update capacity computing (#5253)
* [moe] top2 allow uneven input
* [moe] update capacity computing
* [moe] remove debug info
* [moe] update capacity computing
* [moe] update capacity computing
|
10 months ago |
Xuanlei Zhao
|
7d8e0338a4
|
[moe] init mixtral impl
|
10 months ago |
Hongxin Liu
|
084c91246c
|
[llama] fix memory issue (#5371)
* [llama] fix memory issue
* [llama] add comment
|
10 months ago |
Hongxin Liu
|
c53ddda88f
|
[lr-scheduler] fix load state dict and add test (#5369)
|
10 months ago |
Hongxin Liu
|
eb4f2d90f9
|
[llama] polish training script and fix optim ckpt (#5368)
|
10 months ago |
Camille Zhong
|
a5756a8720
|
[eval] update llama npu eval (#5366)
|
10 months ago |
Camille Zhong
|
44ca61a22b
|
[llama] fix neftune & pbar with start_step (#5364)
|
10 months ago |
Hongxin Liu
|
a4cec1715b
|
[llama] add flash attn patch for npu (#5362)
|
10 months ago |
Hongxin Liu
|
73f9f23fc6
|
[llama] update training script (#5360)
* [llama] update training script
* [doc] polish docstr
|
10 months ago |
Hongxin Liu
|
6c0fa7b9a8
|
[llama] fix dataloader for hybrid parallel (#5358)
* [plugin] refactor prepare dataloader
* [plugin] update train script
|
10 months ago |
Hongxin Liu
|
2dd01e3a14
|
[gemini] fix param op hook when output is tuple (#5355)
* [gemini] fix param op hook when output is tuple
* [gemini] fix param op hook
|
10 months ago |
Wenhao Chen
|
1c790c0877
|
[fix] remove unnecessary dp_size assert (#5351)
* fix: remove unnecessary assert
* test: add more 3d plugin tests
* fix: add warning
|
10 months ago |
Hongxin Liu
|
ffffc32dc7
|
[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347)
* [checkpointio] fix hybrid parallel optim checkpoint
* [extension] fix cuda extension
* [checkpointio] fix gemini optimizer checkpoint
* polish code
|
10 months ago |
YeAnbang
|
c5239840e6
|
[Chat] fix sft loss nan (#5345)
* fix script
* fix script
* fix chat nan
* fix chat nan
|
10 months ago |
Frank Lee
|
abd8e77ad8
|
[extension] fixed exception catch (#5342)
|
10 months ago |
digger yu
|
71321a07cf
|
fix typo change dosen't to doesn't (#5308)
|
10 months ago |
digger yu
|
6a3086a505
|
fix typo under extensions/ (#5330)
|
10 months ago |
Frank Lee
|
febed23288
|
[doc] added docs for extensions (#5324)
* [doc] added docs for extensions
* polish
* polish
|
10 months ago |
flybird11111
|
388179f966
|
[tests] fix t5 test. (#5322)
* [ci] fix shardformer tests. (#5255)
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* fix t5 test
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
|
10 months ago |
Frank Lee
|
a6709afe66
|
Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api
[accelerator] fixed npu api
|
10 months ago |
FrankLeeeee
|
087d0cb1fc
|
[accelerator] fixed npu api
|
10 months ago |
Frank Lee
|
8823cc4831
|
Merge pull request #5310 from hpcaitech/feature/npu
Feature/npu
|
10 months ago |
Frank Lee
|
73f4dc578e
|
[workflow] updated CI image (#5318)
|
10 months ago |
Frank Lee
|
7cfed5f076
|
[feat] refactored extension module (#5298)
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
|
10 months ago |
digger yu
|
bce9499ed3
|
fix some typo (#5307)
|
10 months ago |
李文军
|
ec912b1ba9
|
[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228)
|
10 months ago |
Desperado-Jia
|
ddf879e2db
|
fix bug for mefture (#5299)
|
10 months ago |
Hongxin Liu
|
d7f8db8e21
|
[hotfix] fix 3d plugin test (#5292)
|
10 months ago |
flybird11111
|
f7e3f82a7e
|
fix llama pretrain (#5287)
|
11 months ago |
Desperado-Jia
|
6a56967855
|
[doc] add llama2-13B disyplay (#5285)
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
|
11 months ago |
Michelle
|
32cb74493a
|
fix auto loading gpt2 tokenizer (#5279)
|
11 months ago |
Frank Lee
|
d66e6988bc
|
Merge pull request #5278 from ver217/sync/npu
[sync] sync npu branch with main
|
11 months ago |
ver217
|
148469348a
|
Merge branch 'main' into sync/npu
|
11 months ago |
Zhongkai Zhao
|
5d9a0ae75b
|
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)
|
11 months ago |
flybird11111
|
46e091651b
|
[shardformer] hybridparallelplugin support gradients accumulation. (#5246)
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
|
11 months ago |
flybird11111
|
2a0558d8ec
|
[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
|
11 months ago |
Frank Lee
|
d69cd2eb89
|
[workflow] fixed oom tests (#5275)
* [workflow] fixed oom tests
* polish
* polish
* polish
|
11 months ago |
Frank Lee
|
04244aaaf1
|
[workflow] fixed incomplete bash command (#5272)
|
11 months ago |
Wenhao Chen
|
ef4f0ee854
|
[hotfix]: add pp sanity check and fix mbs arg (#5268)
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
|
11 months ago |
binmakeswell
|
c174c4fc5f
|
[doc] fix doc typo (#5256)
* [doc] fix annotation display
* [doc] fix llama2 doc
|
11 months ago |
flybird11111
|
e830ef917d
|
[ci] fix shardformer tests. (#5255)
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
|
11 months ago |
digger yu
|
756c400ad2
|
fix typo in applications/ColossalEval/README.md (#5250)
|
11 months ago |
Frank Lee
|
2b83418719
|
[ci] fixed ddp test (#5254)
* [ci] fixed ddp test
* polish
|
11 months ago |