Yuanheng Zhao
36c4bb2893
[Fix] Grok-1 use tokenizer from the same pretrained path ( #5532 )
...
* [fix] use tokenizer from the same pretrained path
* trust remote code
2024-03-28 16:30:04 +08:00
Insu Jang
00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used ( #5189 )
...
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution
* Change static methods for t5 layer distribution to member functions
* Change static methods for whisper layer distribution to member functions
* Replace whisper policy usage with self one
* Fix test case to use non-static layer distribution methods
* fix: fix typo
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-03-27 13:57:00 +08:00
github-actions[bot]
e6707a6e8d
[format] applied code formatting on changed files in pull request 5510 ( #5517 )
...
Co-authored-by: github-actions <github-actions@github.com>
2024-03-27 11:21:03 +08:00
Hongxin Liu
19e1a5cf16
[shardformer] update colo attention to support custom mask ( #5510 )
...
* [feature] refactor colo attention (#5462 )
* [extension] update api
* [feature] add colo attention
* [feature] update sdpa
* [feature] update npu attention
* [feature] update flash-attn
* [test] add flash attn test
* [test] update flash attn test
* [shardformer] update modeling to fit colo attention (#5465 )
* [misc] refactor folder structure
* [shardformer] update llama flash-attn
* [shardformer] fix llama policy
* [devops] update tensornvme install
* [test] update llama test
* [shardformer] update colo attn kernel dispatch
* [shardformer] update blip2
* [shardformer] update chatglm
* [shardformer] update gpt2
* [shardformer] update gptj
* [shardformer] update opt
* [shardformer] update vit
* [shardformer] update colo attention mask prep
* [shardformer] update whisper
* [test] fix shardformer tests (#5514 )
* [test] fix shardformer tests
* [test] fix shardformer tests
2024-03-27 11:19:32 +08:00
Edenzzzz
9a3321e9f4
Merge pull request #5515 from Edenzzzz/fix_layout_convert
...
Fix layout convertor caching
2024-03-26 19:51:02 +08:00
Edenzzzz
18edcd5368
Empty-Commit
2024-03-26 19:50:41 +08:00
Edenzzzz
61da3fbc52
fixed layout converter caching and updated tester
2024-03-26 17:22:27 +08:00
Rocky Duan
cbe34c557c
Fix ColoTensorSpec for py11 ( #5440 )
2024-03-26 15:56:49 +08:00
Hongxin Liu
a7790a92e8
[devops] fix example test ci ( #5504 )
2024-03-26 15:09:05 +08:00
Yuanheng Zhao
131f32a076
[fix] fix grok-1 example typo ( #5506 )
2024-03-26 10:19:42 +08:00
flybird11111
0688d92e2d
[shardformer]Fix lm parallel. ( #5480 )
...
* fix
* padding vocab_size when using pipeline parallellism
padding vocab_size when using pipeline parallellism
fix
fix
* fix
* fix
fix
fix
* fix gather output
* fix
* fix
* fix
fix resize embedding
fix resize embedding
* fix resize embedding
fix
* revert
* revert
* revert
* fix lm forward distribution
* fix
* test ci
* fix
2024-03-25 17:21:51 +08:00
binmakeswell
34e909256c
[release] grok-1 inference benchmark ( #5500 )
...
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
2024-03-25 14:42:51 +08:00
Wenhao Chen
bb0a668fee
[hotfix] set return_outputs=False in examples and polish code ( #5404 )
...
* fix: simplify merge_batch
* fix: use return_outputs=False to eliminate extra memory consumption
* feat: add return_outputs warning
* style: remove `return_outputs=False` as it is the default value
2024-03-25 12:31:09 +08:00
Yuanheng Zhao
5fcd7795cd
[example] update Grok-1 inference ( #5495 )
...
* revise grok-1 example
* remove unused arg in scripts
* prevent re-installing torch
* update readme
* revert modifying colossalai requirements
* add perf
* trivial
* add tokenizer url
2024-03-24 20:24:11 +08:00
binmakeswell
6df844b8c4
[release] grok-1 314b inference ( #5490 )
...
* [release] grok-1 inference
* [release] grok-1 inference
* [release] grok-1 inference
2024-03-22 15:48:12 +08:00
Hongxin Liu
848a574c26
[example] add grok-1 inference ( #5485 )
...
* [misc] add submodule
* remove submodule
* [example] support grok-1 tp inference
* [example] add grok-1 inference script
* [example] refactor code
* [example] add grok-1 readme
* [exmaple] add test ci
* [exmaple] update readme
2024-03-21 18:07:22 +08:00
binmakeswell
d158fc0e64
[doc] update open-sora demo ( #5479 )
...
* [doc] update open-sora demo
* [doc] update open-sora demo
* [doc] update open-sora demo
2024-03-20 16:08:41 +08:00
binmakeswell
bd998ced03
[doc] release Open-Sora 1.0 with model weights ( #5468 )
...
* [doc] release Open-Sora 1.0 with model weights
* [doc] release Open-Sora 1.0 with model weights
* [doc] release Open-Sora 1.0 with model weights
2024-03-18 18:31:18 +08:00
flybird11111
5e16bf7980
[shardformer] fix gathering output when using tensor parallelism ( #5431 )
...
* fix
* padding vocab_size when using pipeline parallellism
padding vocab_size when using pipeline parallellism
fix
fix
* fix
* fix
fix
fix
* fix gather output
* fix
* fix
* fix
fix resize embedding
fix resize embedding
* fix resize embedding
fix
* revert
* revert
* revert
2024-03-18 15:55:11 +08:00
Hongxin Liu
f2e8b9ef9f
[devops] fix compatibility ( #5444 )
...
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
2024-03-13 15:24:13 +08:00
digger yu
385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. ( #5429 )
2024-03-12 11:25:16 +08:00
Camille Zhong
da885ed540
fix tensor data update for gemini loss caluculation ( #5442 )
2024-03-11 13:49:58 +08:00
Hongxin Liu
8020f42630
[release] update version ( #5411 )
2024-03-07 23:36:07 +08:00
Camille Zhong
743e7fad2f
[colossal-llama2] add stream chat examlple for chat version model ( #5428 )
...
* add stream chat for chat version
* remove os.system clear
* modify function name
2024-03-07 14:58:56 +08:00
Youngon
68f55a709c
[hotfix] fix stable diffusion inference bug. ( #5289 )
...
* Update train_ddp.yaml
delete "strategy" to fix DDP config loading bug in "main.py"
* Update train_ddp.yaml
fix inference with scripts/txt2img.py config file load bug.
* Update README.md
add pretrain model test code.
2024-03-05 22:03:40 +08:00
hugo-syn
c8003d463b
[doc] Fix typo s/infered/inferred/ ( #5288 )
...
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2024-03-05 22:02:08 +08:00
digger yu
5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO ( #5335 )
...
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-03-05 21:52:30 +08:00
Dongruixuan Li
a7ae2b5b4c
[eval-hotfix] set few_shot_data to None when few shot is disabled ( #5422 )
2024-03-05 21:48:55 +08:00
digger yu
049121d19d
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ ( #5317 )
2024-03-05 21:48:46 +08:00
digger yu
16c96d4d8c
[hotfix] fix typo change _descrption to _description ( #5331 )
2024-03-05 21:47:48 +08:00
digger yu
70cce5cbed
[doc] update some translations with README-zh-Hans.md ( #5382 )
2024-03-05 21:45:55 +08:00
Luo Yihang
e239cf9060
[hotfix] fix typo of openmoe model source ( #5403 )
2024-03-05 21:44:38 +08:00
MickeyCHAN
e304e4db35
[hotfix] fix sd vit import error ( #5420 )
...
* fix import error
* Update dpt_depth.py
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-03-05 21:41:23 +08:00
Hongxin Liu
070df689e6
[devops] fix extention building ( #5427 )
2024-03-05 15:35:54 +08:00
binmakeswell
822241a99c
[doc] sora release ( #5425 )
...
* [doc] sora release
* [doc] sora release
* [doc] sora release
* [doc] sora release
2024-03-05 12:08:58 +08:00
flybird11111
29695cf70c
[example]add gpt2 benchmark example script. ( #5295 )
...
* benchmark gpt2
* fix
fix
fix
fix
* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed ddp test (#5254 )
* [ci] fixed ddp test
* polish
* fix typo in applications/ColossalEval/README.md (#5250 )
* [ci] fix shardformer tests. (#5255 )
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [doc] fix doc typo (#5256 )
* [doc] fix annotation display
* [doc] fix llama2 doc
* [hotfix]: add pp sanity check and fix mbs arg (#5268 )
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
* [workflow] fixed incomplete bash command (#5272 )
* [workflow] fixed oom tests (#5275 )
* [workflow] fixed oom tests
* polish
* polish
* polish
* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 )
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [shardformer] hybridparallelplugin support gradients accumulation. (#5246 )
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )
* fix auto loading gpt2 tokenizer (#5279 )
* [doc] add llama2-13B disyplay (#5285 )
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* fix llama pretrain (#5287 )
* fix
* fix
* fix
fix
* fix
fix
fix
* fix
fix
* benchmark gpt2
* fix
fix
fix
fix
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* fix
fix
* fix
fix
fix
* fix
* fix
fix
fix
fix
fix
* fix
* Update shardformer.py
---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
2024-03-04 16:18:13 +08:00
Camille Zhong
4b8312c08e
fix sft single turn inference example ( #5416 )
2024-03-01 17:27:50 +08:00
binmakeswell
a1c6cdb189
[doc] fix blog link
2024-02-29 15:01:43 +08:00
binmakeswell
5de940de32
[doc] fix blog link
2024-02-29 15:01:43 +08:00
Frank Lee
2461f37886
[workflow] added pypi channel ( #5412 )
2024-02-29 13:56:55 +08:00
Tong Li
a28c971516
update requirements ( #5407 )
2024-02-28 17:46:27 +08:00
flybird11111
0a25e16e46
[shardformer]gather llama logits ( #5398 )
...
* gather llama logits
* fix
2024-02-27 22:44:07 +08:00
Frank Lee
dcdd8a5ef7
[setup] fixed nightly release ( #5388 )
2024-02-27 15:19:13 +08:00
QinLuo
bf34c6fef6
[fsdp] impl save/load shard model/optimizer ( #5357 )
2024-02-27 13:51:14 +08:00
Hongxin Liu
d882d18c65
[example] reuse flash attn patch ( #5400 )
2024-02-27 11:22:07 +08:00
Hongxin Liu
95c21e3950
[extension] hotfix jit extension setup ( #5402 )
2024-02-26 19:46:58 +08:00
Stephan Kölker
5d380a1a21
[hotfix] Fix wrong import in meta_registry ( #5392 )
2024-02-20 19:24:43 +08:00
CZYCW
b833153fd5
[hotfix] fix variable type for top_p ( #5313 )
...
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-02-19 18:25:44 +08:00
Frank Lee
705a62a565
[doc] updated installation command ( #5389 )
2024-02-19 16:54:03 +08:00
yixiaoer
69e3ad01ed
[doc] Fix typo ( #5361 )
2024-02-19 16:53:28 +08:00