Hongxin Liu
7f8b16635b
[misc] refactor launch API and tensor constructor ( #5666 )
...
* [misc] remove config arg from initialize
* [misc] remove old tensor contrusctor
* [plugin] add npu support for ddp
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [devops] fix doc test ci
* [test] fix test launch
* [doc] update launch doc
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
Tong Li
68ec99e946
[hotfix] add soft link to support required files ( #5661 )
2024-04-26 21:12:04 +08:00
Hongxin Liu
1b387ca9fe
[shardformer] refactor pipeline grad ckpt config ( #5646 )
...
* [shardformer] refactor pipeline grad ckpt config
* [shardformer] refactor pipeline grad ckpt config
* [pipeline] fix stage manager
2024-04-25 15:19:30 +08:00
binmakeswell
f4c5aafe29
[example] llama3 ( #5631 )
...
* release llama3
* [release] llama3
* [release] llama3
* [release] llama3
* [release] llama3
2024-04-23 18:48:07 +08:00
Hongxin Liu
4de4e31818
[exampe] update llama example ( #5626 )
...
* [plugin] support dp inside for hybriad parallel
* [example] update llama benchmark
* [example] update llama benchmark
* [example] update llama readme
* [example] update llama readme
2024-04-23 14:12:20 +08:00
Edenzzzz
d83c633ca6
[hotfix] Fix examples no pad token & auto parallel codegen bug; ( #5606 )
...
* fix no pad token bug
* fixed some auto parallel codegen bug, but might not run on torch 2.1
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-04-18 18:15:50 +08:00
Hongxin Liu
641b1ee71a
[devops] remove post commit ci ( #5566 )
...
* [devops] remove post commit ci
* [misc] run pre-commit on all files
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-08 15:09:40 +08:00
digger yu
341263df48
[hotfix] fix typo s/get_defualt_parser /get_default_parser ( #5548 )
2024-04-07 19:04:58 +08:00
digger yu
a799ca343b
[fix] fix typo s/muiti-node /multi-node etc. ( #5448 )
2024-04-07 18:42:15 +08:00
Edenzzzz
15055f9a36
[hotfix] quick fixes to make legacy tutorials runnable ( #5559 )
...
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-04-07 12:06:27 +08:00
Wenhao Chen
e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama ( #5508 )
...
* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`
* feat: apply `GradientCheckpointConfig` to policy and llama_forward
* feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager
* fix: add optional args for `distribute_layer` and `get_stage_index`
* fix: fix changed API calls
* test: update llama tests
* style: polish `GradientCheckpointConfig`
* fix: fix pipeline utils tests
2024-04-01 11:34:58 +08:00
Yuanheng Zhao
36c4bb2893
[Fix] Grok-1 use tokenizer from the same pretrained path ( #5532 )
...
* [fix] use tokenizer from the same pretrained path
* trust remote code
2024-03-28 16:30:04 +08:00
Insu Jang
00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used ( #5189 )
...
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution
* Change static methods for t5 layer distribution to member functions
* Change static methods for whisper layer distribution to member functions
* Replace whisper policy usage with self one
* Fix test case to use non-static layer distribution methods
* fix: fix typo
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-03-27 13:57:00 +08:00
Yuanheng Zhao
131f32a076
[fix] fix grok-1 example typo ( #5506 )
2024-03-26 10:19:42 +08:00
binmakeswell
34e909256c
[release] grok-1 inference benchmark ( #5500 )
...
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
2024-03-25 14:42:51 +08:00
Wenhao Chen
bb0a668fee
[hotfix] set return_outputs=False in examples and polish code ( #5404 )
...
* fix: simplify merge_batch
* fix: use return_outputs=False to eliminate extra memory consumption
* feat: add return_outputs warning
* style: remove `return_outputs=False` as it is the default value
2024-03-25 12:31:09 +08:00
Yuanheng Zhao
5fcd7795cd
[example] update Grok-1 inference ( #5495 )
...
* revise grok-1 example
* remove unused arg in scripts
* prevent re-installing torch
* update readme
* revert modifying colossalai requirements
* add perf
* trivial
* add tokenizer url
2024-03-24 20:24:11 +08:00
binmakeswell
6df844b8c4
[release] grok-1 314b inference ( #5490 )
...
* [release] grok-1 inference
* [release] grok-1 inference
* [release] grok-1 inference
2024-03-22 15:48:12 +08:00
Hongxin Liu
848a574c26
[example] add grok-1 inference ( #5485 )
...
* [misc] add submodule
* remove submodule
* [example] support grok-1 tp inference
* [example] add grok-1 inference script
* [example] refactor code
* [example] add grok-1 readme
* [exmaple] add test ci
* [exmaple] update readme
2024-03-21 18:07:22 +08:00
digger yu
385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. ( #5429 )
2024-03-12 11:25:16 +08:00
Youngon
68f55a709c
[hotfix] fix stable diffusion inference bug. ( #5289 )
...
* Update train_ddp.yaml
delete "strategy" to fix DDP config loading bug in "main.py"
* Update train_ddp.yaml
fix inference with scripts/txt2img.py config file load bug.
* Update README.md
add pretrain model test code.
2024-03-05 22:03:40 +08:00
Luo Yihang
e239cf9060
[hotfix] fix typo of openmoe model source ( #5403 )
2024-03-05 21:44:38 +08:00
MickeyCHAN
e304e4db35
[hotfix] fix sd vit import error ( #5420 )
...
* fix import error
* Update dpt_depth.py
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-03-05 21:41:23 +08:00
Hongxin Liu
070df689e6
[devops] fix extention building ( #5427 )
2024-03-05 15:35:54 +08:00
flybird11111
29695cf70c
[example]add gpt2 benchmark example script. ( #5295 )
...
* benchmark gpt2
* fix
fix
fix
fix
* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed ddp test (#5254 )
* [ci] fixed ddp test
* polish
* fix typo in applications/ColossalEval/README.md (#5250 )
* [ci] fix shardformer tests. (#5255 )
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [doc] fix doc typo (#5256 )
* [doc] fix annotation display
* [doc] fix llama2 doc
* [hotfix]: add pp sanity check and fix mbs arg (#5268 )
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
* [workflow] fixed incomplete bash command (#5272 )
* [workflow] fixed oom tests (#5275 )
* [workflow] fixed oom tests
* polish
* polish
* polish
* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 )
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [shardformer] hybridparallelplugin support gradients accumulation. (#5246 )
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )
* fix auto loading gpt2 tokenizer (#5279 )
* [doc] add llama2-13B disyplay (#5285 )
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* fix llama pretrain (#5287 )
* fix
* fix
* fix
fix
* fix
fix
fix
* fix
fix
* benchmark gpt2
* fix
fix
fix
fix
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* fix
fix
* fix
fix
fix
* fix
* fix
fix
fix
fix
fix
* fix
* Update shardformer.py
---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
2024-03-04 16:18:13 +08:00
Hongxin Liu
d882d18c65
[example] reuse flash attn patch ( #5400 )
2024-02-27 11:22:07 +08:00
digger yu
71321a07cf
fix typo change dosen't to doesn't ( #5308 )
2024-01-30 09:57:38 +08:00
Frank Lee
8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
...
Feature/npu
2024-01-29 13:49:39 +08:00
Frank Lee
7cfed5f076
[feat] refactored extension module ( #5298 )
...
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
2024-01-25 17:01:48 +08:00
digger yu
bce9499ed3
fix some typo ( #5307 )
2024-01-25 13:56:27 +08:00
flybird11111
f7e3f82a7e
fix llama pretrain ( #5287 )
2024-01-19 17:49:02 +08:00
ver217
148469348a
Merge branch 'main' into sync/npu
2024-01-18 12:05:21 +08:00
Wenhao Chen
ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg ( #5268 )
...
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
2024-01-15 15:57:40 +08:00
binmakeswell
c174c4fc5f
[doc] fix doc typo ( #5256 )
...
* [doc] fix annotation display
* [doc] fix llama2 doc
2024-01-11 21:01:11 +08:00
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
Xuanlei Zhao
dd2c28a323
[npu] use extension for op builder ( #5172 )
...
* update extension
* update cpu adam
* update is
* add doc for cpu adam
* update kernel
* update commit
* update flash
* update memory efficient
* update flash attn
* update flash attention loader
* update api
* fix
* update doc
* update example time limit
* reverse change
* fix doc
* remove useless kernel
* fix
* not use warning
* update
* update
2024-01-08 11:39:16 +08:00
Wenhao Chen
3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode ( #5201 )
...
* fix: remove drop last in val & test dataloader
* feat: add run_forward_only, support arbitrary bs
* chore: modify ci script
2024-01-02 23:41:12 +08:00
Wenhao Chen
4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp ( #5134 )
...
* test: add more p2p tests
* fix: remove send_forward_recv_forward as p2p op list need to use the same group
* fix: make send and receive atomic
* feat: update P2PComm fn
* feat: add metadata cache in 1f1b
* feat: add metadata cache in interleaved pp
* feat: modify is_xx_stage fn
* revert: add _broadcast_object_list
* feat: add interleaved pp in llama policy
* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2023-12-22 10:44:00 +08:00
flybird11111
21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel ( #5150 )
...
* fix
aaa
fix
fix
fix
* fix
* fix
* test ci
* fix ci
fix
2023-12-08 11:10:51 +08:00
binmakeswell
177c79f2d1
[doc] add moe news ( #5128 )
...
* [doc] add moe news
* [doc] add moe news
* [doc] add moe news
2023-11-28 17:44:06 +08:00
Wenhao Chen
7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert ( #5088 )
...
* [shardformer] implement policy for all GPT-J models and test
* [shardformer] support interleaved pipeline parallel for bert finetune
* [shardformer] shardformer support falcon (#4883 )
* [shardformer]: fix interleaved pipeline for bert model (#5048 )
* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093 )
* Add Mistral support for Shardformer (#5103 )
* [shardformer] add tests to mistral (#5105 )
---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>
2023-11-28 16:54:42 +08:00
digger yu
d5661f0f25
[nfc] fix typo change directoty to directory ( #5111 )
2023-11-27 18:25:53 +08:00
Xuanlei Zhao
3acbf6d496
[npu] add npu support for hybrid plugin and llama ( #5090 )
...
* llama 3d
* update
* fix autocast
2023-11-22 19:23:21 +08:00
flybird11111
aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it ( #5084 )
...
* fix flash attn
* fix
fix
2023-11-22 16:00:07 +08:00
Hongxin Liu
1cd7efc520
[inference] refactor examples and fix schedule ( #5077 )
...
* [setup] refactor infer setup
* [hotfix] fix infenrece behavior on 1 1 gpu
* [exmaple] refactor inference examples
2023-11-21 10:46:03 +08:00
Bin Jia
4e3959d316
[hotfix/hybridengine] Fix init model with random parameters in benchmark ( #5074 )
...
* fix init model with random parameters
* fix example
2023-11-20 20:15:25 +08:00
github-actions[bot]
8921a73c90
[format] applied code formatting on changed files in pull request 5067 ( #5072 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-11-20 19:46:43 +08:00
Xu Kai
fb103cfd6e
[inference] update examples and engine ( #5073 )
...
* update examples and engine
* fix choices
* update example
2023-11-20 19:44:52 +08:00
Hongxin Liu
e5ce4c8ea6
[npu] add npu support for gemini and zero ( #5067 )
...
* [npu] setup device utils (#5047 )
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052 )
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065 )
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
2023-11-20 16:12:41 +08:00
Cuiqing Li (李崔卿)
bce919708f
[Kernels]added flash-decoidng of triton ( #5063 )
...
* added flash-decoidng of triton based on lightllm kernel
* add req
* clean
* clean
* delete build.sh
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
2023-11-20 13:58:29 +08:00