Commit Graph

3572 Commits (colossalchat)

Author SHA1 Message Date
Steve Luo ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454) 2024-03-13 16:00:55 +08:00
Hongxin Liu f2e8b9ef9f
[devops] fix compatibility (#5444)
* [devops] fix compatibility

* [hotfix] update compatibility test on pr

* [devops] fix compatibility

* [devops] record duration during comp test

* [test] decrease test duration

* fix falcon
2024-03-13 15:24:13 +08:00
傅剑寒 6fd355a5a6
Merge pull request #5452 from Courtesy-Xs/fix_include_path
fix include path
2024-03-13 11:26:41 +08:00
xs_courtesy c1c45e9d8e fix include path 2024-03-13 11:21:06 +08:00
Steve Luo b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441) 2024-03-12 17:48:02 +08:00
傅剑寒 368a2aa543
Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation
Refactor colossal-infer code arch
2024-03-12 14:14:37 +08:00
digger yu 385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. (#5429) 2024-03-12 11:25:16 +08:00
xs_courtesy 095c070a6e refactor code 2024-03-11 17:06:57 +08:00
Camille Zhong da885ed540
fix tensor data update for gemini loss caluculation (#5442) 2024-03-11 13:49:58 +08:00
傅剑寒 21e1e3645c
Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config
Add query and other components
2024-03-11 11:15:29 +08:00
Runyu Lu 633e95b301 [doc] add doc 2024-03-11 10:56:51 +08:00
Runyu Lu 9dec66fad6 [fix] multi graphs capture error 2024-03-11 10:51:16 +08:00
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error 2024-03-11 10:49:31 +08:00
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 2024-03-08 16:21:12 +08:00
xs_courtesy 5eb5ff1464 refactor code 2024-03-08 15:41:14 +08:00
xs_courtesy 01d289d8e5 Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config 2024-03-08 15:04:55 +08:00
xs_courtesy a46598ac59 add reusable utils for cuda 2024-03-08 14:53:29 +08:00
傅剑寒 2b28b54ac6
Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul
【Inference】Add silu_and_mul for infer
2024-03-08 14:44:37 +08:00
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api 2024-03-08 14:19:35 +08:00
Hongxin Liu 8020f42630
[release] update version (#5411) 2024-03-07 23:36:07 +08:00
xs_courtesy 95c21498d4 add silu_and_mul for infer 2024-03-07 16:57:49 +08:00
Camille Zhong 743e7fad2f
[colossal-llama2] add stream chat examlple for chat version model (#5428)
* add stream chat for chat version

* remove os.system clear

* modify function name
2024-03-07 14:58:56 +08:00
Youngon 68f55a709c
[hotfix] fix stable diffusion inference bug. (#5289)
* Update train_ddp.yaml

delete  "strategy" to fix DDP config loading bug in "main.py"

* Update train_ddp.yaml

fix inference with scripts/txt2img.py config file load bug.

* Update README.md

add pretrain model test code.
2024-03-05 22:03:40 +08:00
hugo-syn c8003d463b
[doc] Fix typo s/infered/inferred/ (#5288)
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2024-03-05 22:02:08 +08:00
digger yu 5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335)
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-03-05 21:52:30 +08:00
Dongruixuan Li a7ae2b5b4c
[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) 2024-03-05 21:48:55 +08:00
digger yu 049121d19d
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) 2024-03-05 21:48:46 +08:00
digger yu 16c96d4d8c
[hotfix] fix typo change _descrption to _description (#5331) 2024-03-05 21:47:48 +08:00
digger yu 70cce5cbed
[doc] update some translations with README-zh-Hans.md (#5382) 2024-03-05 21:45:55 +08:00
Luo Yihang e239cf9060
[hotfix] fix typo of openmoe model source (#5403) 2024-03-05 21:44:38 +08:00
MickeyCHAN e304e4db35
[hotfix] fix sd vit import error (#5420)
* fix import error

* Update dpt_depth.py

---------

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-03-05 21:41:23 +08:00
Hongxin Liu 070df689e6
[devops] fix extention building (#5427) 2024-03-05 15:35:54 +08:00
binmakeswell 822241a99c
[doc] sora release (#5425)
* [doc] sora release

* [doc] sora release

* [doc] sora release

* [doc] sora release
2024-03-05 12:08:58 +08:00
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295)
* benchmark gpt2

* fix

fix

fix

fix

* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247)

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed ddp test (#5254)

* [ci] fixed ddp test

* polish

* fix typo in  applications/ColossalEval/README.md (#5250)

* [ci] fix shardformer tests. (#5255)

* fix ci

fix

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [doc] fix doc typo (#5256)

* [doc] fix annotation display

* [doc] fix llama2 doc

* [hotfix]: add pp sanity check and fix mbs arg (#5268)

* fix: fix misleading mbs arg

* feat: add pp sanity check

* fix: fix 1f1b sanity check

* [workflow] fixed incomplete bash command (#5272)

* [workflow] fixed oom tests (#5275)

* [workflow] fixed oom tests

* polish

* polish

* polish

* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)

* fix ci

fix

* fix test

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

* fix

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [shardformer] hybridparallelplugin support gradients accumulation. (#5246)

* support gradients acc

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

* fix

fix

* fix

fix

fix

* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)

* fix auto loading gpt2 tokenizer (#5279)

* [doc] add llama2-13B disyplay (#5285)

* Update README.md

* fix 13b typo

---------

Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* fix llama pretrain (#5287)

* fix

* fix

* fix

fix

* fix

fix

fix

* fix

fix

* benchmark gpt2

* fix

fix

fix

fix

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* fix

fix

* fix

fix

fix

* fix

* fix

fix

fix

fix

fix

* fix

* Update shardformer.py

---------

Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
2024-03-04 16:18:13 +08:00
Frank Lee 593a72e4d5
Merge pull request #5424 from FrankLeeeee/sync/main
Sync/main
2024-03-04 10:13:59 +08:00
FrankLeeeee 0310b76e9d Merge branch 'main' into sync/main 2024-03-04 10:09:36 +08:00
Camille Zhong 4b8312c08e
fix sft single turn inference example (#5416) 2024-03-01 17:27:50 +08:00
binmakeswell a1c6cdb189 [doc] fix blog link 2024-02-29 15:01:43 +08:00
binmakeswell 5de940de32 [doc] fix blog link 2024-02-29 15:01:43 +08:00
Frank Lee 2461f37886
[workflow] added pypi channel (#5412) 2024-02-29 13:56:55 +08:00
Tong Li a28c971516
update requirements (#5407) 2024-02-28 17:46:27 +08:00
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
* move benchmark-related code to the example directory.

* fix bugs in test_fused_rotary_embedding.py
2024-02-28 16:46:03 +08:00
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
* add cuda KVCache kernel

* annotation benchmark_kvcache_copy

* add use cuda

* fix import path

* move benchmark scripts to example/

* rm benchmark codes in test_kv_cache_memcpy.py

* rm redundancy codes

* rm redundancy codes

* pr was modified according to the review
2024-02-28 14:36:50 +08:00
flybird11111 0a25e16e46
[shardformer]gather llama logits (#5398)
* gather llama logits

* fix
2024-02-27 22:44:07 +08:00
Frank Lee dcdd8a5ef7
[setup] fixed nightly release (#5388) 2024-02-27 15:19:13 +08:00
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357) 2024-02-27 13:51:14 +08:00
Hongxin Liu d882d18c65
[example] reuse flash attn patch (#5400) 2024-02-27 11:22:07 +08:00
Hongxin Liu 95c21e3950
[extension] hotfix jit extension setup (#5402) 2024-02-26 19:46:58 +08:00
Yuanheng Zhao 19061188c3
[Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)
fix dependency in pytest
2024-02-26 16:17:47 +08:00
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395)
* Fix bugs in inference_engine

* fix bugs in engine.py

* rm  CUDA_VISIBLE_DEVICES

* add request_ids in generate

* fix bug in engine.py

* add logger.debug for BatchBucket
2024-02-23 10:51:35 +08:00