Camille Zhong
|
da885ed540
|
fix tensor data update for gemini loss caluculation (#5442)
|
9 months ago |
Hongxin Liu
|
8020f42630
|
[release] update version (#5411)
|
9 months ago |
Camille Zhong
|
743e7fad2f
|
[colossal-llama2] add stream chat examlple for chat version model (#5428)
* add stream chat for chat version
* remove os.system clear
* modify function name
|
9 months ago |
Youngon
|
68f55a709c
|
[hotfix] fix stable diffusion inference bug. (#5289)
* Update train_ddp.yaml
delete "strategy" to fix DDP config loading bug in "main.py"
* Update train_ddp.yaml
fix inference with scripts/txt2img.py config file load bug.
* Update README.md
add pretrain model test code.
|
9 months ago |
hugo-syn
|
c8003d463b
|
[doc] Fix typo s/infered/inferred/ (#5288)
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
|
9 months ago |
digger yu
|
5e1c93d732
|
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335)
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
|
9 months ago |
Dongruixuan Li
|
a7ae2b5b4c
|
[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422)
|
9 months ago |
digger yu
|
049121d19d
|
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317)
|
9 months ago |
digger yu
|
16c96d4d8c
|
[hotfix] fix typo change _descrption to _description (#5331)
|
9 months ago |
digger yu
|
70cce5cbed
|
[doc] update some translations with README-zh-Hans.md (#5382)
|
9 months ago |
Luo Yihang
|
e239cf9060
|
[hotfix] fix typo of openmoe model source (#5403)
|
9 months ago |
MickeyCHAN
|
e304e4db35
|
[hotfix] fix sd vit import error (#5420)
* fix import error
* Update dpt_depth.py
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
|
9 months ago |
Hongxin Liu
|
070df689e6
|
[devops] fix extention building (#5427)
|
9 months ago |
binmakeswell
|
822241a99c
|
[doc] sora release (#5425)
* [doc] sora release
* [doc] sora release
* [doc] sora release
* [doc] sora release
|
9 months ago |
flybird11111
|
29695cf70c
|
[example]add gpt2 benchmark example script. (#5295)
* benchmark gpt2
* fix
fix
fix
fix
* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247)
* [workflow] fixed build CI (#5240)
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251)
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed ddp test (#5254)
* [ci] fixed ddp test
* polish
* fix typo in applications/ColossalEval/README.md (#5250)
* [ci] fix shardformer tests. (#5255)
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [doc] fix doc typo (#5256)
* [doc] fix annotation display
* [doc] fix llama2 doc
* [hotfix]: add pp sanity check and fix mbs arg (#5268)
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
* [workflow] fixed incomplete bash command (#5272)
* [workflow] fixed oom tests (#5275)
* [workflow] fixed oom tests
* polish
* polish
* polish
* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [shardformer] hybridparallelplugin support gradients accumulation. (#5246)
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)
* fix auto loading gpt2 tokenizer (#5279)
* [doc] add llama2-13B disyplay (#5285)
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* fix llama pretrain (#5287)
* fix
* fix
* fix
fix
* fix
fix
fix
* fix
fix
* benchmark gpt2
* fix
fix
fix
fix
* [workflow] fixed build CI (#5240)
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251)
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* fix
fix
* fix
fix
fix
* fix
* fix
fix
fix
fix
fix
* fix
* Update shardformer.py
---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
|
9 months ago |
Camille Zhong
|
4b8312c08e
|
fix sft single turn inference example (#5416)
|
9 months ago |
binmakeswell
|
a1c6cdb189
|
[doc] fix blog link
|
9 months ago |
binmakeswell
|
5de940de32
|
[doc] fix blog link
|
9 months ago |
Frank Lee
|
2461f37886
|
[workflow] added pypi channel (#5412)
|
9 months ago |
Tong Li
|
a28c971516
|
update requirements (#5407)
|
9 months ago |
flybird11111
|
0a25e16e46
|
[shardformer]gather llama logits (#5398)
* gather llama logits
* fix
|
9 months ago |
Frank Lee
|
dcdd8a5ef7
|
[setup] fixed nightly release (#5388)
|
9 months ago |
QinLuo
|
bf34c6fef6
|
[fsdp] impl save/load shard model/optimizer (#5357)
|
9 months ago |
Hongxin Liu
|
d882d18c65
|
[example] reuse flash attn patch (#5400)
|
9 months ago |
Hongxin Liu
|
95c21e3950
|
[extension] hotfix jit extension setup (#5402)
|
9 months ago |
Stephan Kölker
|
5d380a1a21
|
[hotfix] Fix wrong import in meta_registry (#5392)
|
9 months ago |
CZYCW
|
b833153fd5
|
[hotfix] fix variable type for top_p (#5313)
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
|
9 months ago |
Frank Lee
|
705a62a565
|
[doc] updated installation command (#5389)
|
9 months ago |
yixiaoer
|
69e3ad01ed
|
[doc] Fix typo (#5361)
|
9 months ago |
Hongxin Liu
|
7303801854
|
[llama] fix training and inference scripts (#5384)
* [llama] refactor inference example to fit sft
* [llama] fix training script to fit gemini
* [llama] fix inference script
|
9 months ago |
Hongxin Liu
|
adae123df3
|
[release] update version (#5380)
|
10 months ago |
Frank Lee
|
efef43b53c
|
Merge pull request #5372 from hpcaitech/exp/mixtral
|
10 months ago |
Frank Lee
|
4c03347fc7
|
Merge pull request #5377 from hpcaitech/example/llama-npu
[llama] support npu for Colossal-LLaMA-2
|
10 months ago |
ver217
|
06db94fbc9
|
[moe] fix tests
|
10 months ago |
Hongxin Liu
|
65e5d6baa5
|
[moe] fix mixtral optim checkpoint (#5344)
|
10 months ago |
Hongxin Liu
|
956b561b54
|
[moe] fix mixtral forward default value (#5329)
|
10 months ago |
Hongxin Liu
|
b60be18dcc
|
[moe] fix mixtral checkpoint io (#5314)
|
10 months ago |
Hongxin Liu
|
da39d21b71
|
[moe] support mixtral (#5309)
* [moe] add mixtral block for single expert
* [moe] mixtral block fwd support uneven ep
* [moe] mixtral block bwd support uneven ep
* [moe] add mixtral moe layer
* [moe] simplify replace
* [meo] support save sharded mixtral
* [meo] support load sharded mixtral
* [meo] support save sharded optim
* [meo] integrate moe manager into plug
* [meo] fix optimizer load
* [meo] fix mixtral layer
|
10 months ago |
Hongxin Liu
|
c904d2ae99
|
[moe] update capacity computing (#5253)
* [moe] top2 allow uneven input
* [moe] update capacity computing
* [moe] remove debug info
* [moe] update capacity computing
* [moe] update capacity computing
|
10 months ago |
Xuanlei Zhao
|
7d8e0338a4
|
[moe] init mixtral impl
|
10 months ago |
Hongxin Liu
|
084c91246c
|
[llama] fix memory issue (#5371)
* [llama] fix memory issue
* [llama] add comment
|
10 months ago |
Hongxin Liu
|
c53ddda88f
|
[lr-scheduler] fix load state dict and add test (#5369)
|
10 months ago |
Hongxin Liu
|
eb4f2d90f9
|
[llama] polish training script and fix optim ckpt (#5368)
|
10 months ago |
Camille Zhong
|
a5756a8720
|
[eval] update llama npu eval (#5366)
|
10 months ago |
Camille Zhong
|
44ca61a22b
|
[llama] fix neftune & pbar with start_step (#5364)
|
10 months ago |
Hongxin Liu
|
a4cec1715b
|
[llama] add flash attn patch for npu (#5362)
|
10 months ago |
Hongxin Liu
|
73f9f23fc6
|
[llama] update training script (#5360)
* [llama] update training script
* [doc] polish docstr
|
10 months ago |
Hongxin Liu
|
6c0fa7b9a8
|
[llama] fix dataloader for hybrid parallel (#5358)
* [plugin] refactor prepare dataloader
* [plugin] update train script
|
10 months ago |
Hongxin Liu
|
2dd01e3a14
|
[gemini] fix param op hook when output is tuple (#5355)
* [gemini] fix param op hook when output is tuple
* [gemini] fix param op hook
|
10 months ago |
Wenhao Chen
|
1c790c0877
|
[fix] remove unnecessary dp_size assert (#5351)
* fix: remove unnecessary assert
* test: add more 3d plugin tests
* fix: add warning
|
10 months ago |