FrankLeeeee
087d0cb1fc
[accelerator] fixed npu api
10 months ago
Frank Lee
8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
...
Feature/npu
10 months ago
Frank Lee
73f4dc578e
[workflow] updated CI image ( #5318 )
10 months ago
Frank Lee
7cfed5f076
[feat] refactored extension module ( #5298 )
...
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
10 months ago
digger yu
bce9499ed3
fix some typo ( #5307 )
10 months ago
李文军
ec912b1ba9
[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style ( #5228 )
10 months ago
Desperado-Jia
ddf879e2db
fix bug for mefture ( #5299 )
10 months ago
Hongxin Liu
d7f8db8e21
[hotfix] fix 3d plugin test ( #5292 )
10 months ago
flybird11111
f7e3f82a7e
fix llama pretrain ( #5287 )
10 months ago
Desperado-Jia
6a56967855
[doc] add llama2-13B disyplay ( #5285 )
...
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
10 months ago
Michelle
32cb74493a
fix auto loading gpt2 tokenizer ( #5279 )
10 months ago
Frank Lee
d66e6988bc
Merge pull request #5278 from ver217/sync/npu
...
[sync] sync npu branch with main
10 months ago
ver217
148469348a
Merge branch 'main' into sync/npu
10 months ago
Zhongkai Zhao
5d9a0ae75b
[hotfix] Fix ShardFormer test execution path when using sequence parallelism ( #5230 )
10 months ago
flybird11111
46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. ( #5246 )
...
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
10 months ago
flybird11111
2a0558d8ec
[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py ( #5276 )
...
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
10 months ago
Frank Lee
d69cd2eb89
[workflow] fixed oom tests ( #5275 )
...
* [workflow] fixed oom tests
* polish
* polish
* polish
10 months ago
Frank Lee
04244aaaf1
[workflow] fixed incomplete bash command ( #5272 )
10 months ago
Wenhao Chen
ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg ( #5268 )
...
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
10 months ago
binmakeswell
c174c4fc5f
[doc] fix doc typo ( #5256 )
...
* [doc] fix annotation display
* [doc] fix llama2 doc
11 months ago
flybird11111
e830ef917d
[ci] fix shardformer tests. ( #5255 )
...
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
11 months ago
digger yu
756c400ad2
fix typo in applications/ColossalEval/README.md ( #5250 )
11 months ago
Frank Lee
2b83418719
[ci] fixed ddp test ( #5254 )
...
* [ci] fixed ddp test
* polish
11 months ago
Frank Lee
d5eeeb1416
[ci] fixed booster test ( #5251 )
...
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
11 months ago
Frank Lee
edf94a35c3
[workflow] fixed build CI ( #5240 )
...
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
11 months ago
digger yu
41e52c1c6e
[doc] fix typo in Colossal-LLaMA-2/README.md ( #5247 )
11 months ago
Frank Lee
9102d655ab
[hotfix] removed unused flag ( #5242 )
11 months ago
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
11 months ago
Elsa Granger
d565df3821
[pipeline] A more general _communicate in p2p ( #5062 )
...
* A more general _communicate
* feat: finish tree_flatten version p2p
* fix: update p2p api calls
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
11 months ago
Xuanlei Zhao
dd2c28a323
[npu] use extension for op builder ( #5172 )
...
* update extension
* update cpu adam
* update is
* add doc for cpu adam
* update kernel
* update commit
* update flash
* update memory efficient
* update flash attn
* update flash attention loader
* update api
* fix
* update doc
* update example time limit
* reverse change
* fix doc
* remove useless kernel
* fix
* not use warning
* update
* update
11 months ago
binmakeswell
7bc6969ce6
[doc] SwiftInfer release ( #5236 )
...
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
11 months ago
github-actions[bot]
4fb4a22a72
[format] applied code formatting on changed files in pull request 5234 ( #5235 )
...
Co-authored-by: github-actions <github-actions@github.com>
11 months ago
binmakeswell
b9b32b15e6
[doc] add Colossal-LLaMA-2-13B ( #5234 )
...
* [doc] add Colossal-LLaMA-2-13B
* [doc] add Colossal-LLaMA-2-13B
* [doc] add Colossal-LLaMA-2-13B
11 months ago
JIMMY ZHAO
ce651270f1
[doc] Make leaderboard format more uniform and good-looking ( #5231 )
...
* Make leaderboard format more unifeid and good-looking
* Update README.md
* Update README.md
11 months ago
Camille Zhong
915b4652f3
[doc] Update README.md of Colossal-LLAMA2 ( #5233 )
...
* Update README.md
* Update README.md
11 months ago
Tong Li
d992b55968
[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model ( #5224 )
...
* update readme
* update readme
* update link
* update
* update readme
* update
* update
* update
* update title
* update example
* update example
* fix content
* add conclusion
* add license
* update
* update
* update version
* fix minor
11 months ago
digger yu
b0b53a171c
[nfc] fix typo colossalai/shardformer/ ( #5133 )
11 months ago
flybird11111
451e9142b8
fix flash attn ( #5209 )
11 months ago
flybird11111
365671be10
fix-test ( #5210 )
...
fix-test
fix-test
11 months ago
Hongxin Liu
7f3400b560
[devops] update torch versoin in ci ( #5217 )
11 months ago
Wenhao Chen
d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock ( #5214 )
...
* fix: add fallback order option and update 1f1b
* fix: fix deadlock comm in interleaved pp
* test: modify p2p test
11 months ago
Wenhao Chen
3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode ( #5201 )
...
* fix: remove drop last in val & test dataloader
* feat: add run_forward_only, support arbitrary bs
* chore: modify ci script
11 months ago
flybird11111
02d2328a04
support linear accumulation fusion ( #5199 )
...
support linear accumulation fusion
support linear accumulation fusion
fix
11 months ago
Zhongkai Zhao
64519eb830
[doc] Update required third-party library list for testing and torch comptibility checking ( #5207 )
...
* doc/update requirements-test.txt
* update torch-cuda compatibility check
11 months ago
Yuanchen
eae01b6740
Improve logic for selecting metrics ( #5196 )
...
Co-authored-by: Xu <yuanchen.xu00@gmail.com>
11 months ago
Wenhao Chen
4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp ( #5134 )
...
* test: add more p2p tests
* fix: remove send_forward_recv_forward as p2p op list need to use the same group
* fix: make send and receive atomic
* feat: update P2PComm fn
* feat: add metadata cache in 1f1b
* feat: add metadata cache in interleaved pp
* feat: modify is_xx_stage fn
* revert: add _broadcast_object_list
* feat: add interleaved pp in llama policy
* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
11 months ago
BlueRum
af952673f7
polish readme in application/chat ( #5194 )
11 months ago
flybird11111
681d9b12ef
[doc] update pytorch version in documents. ( #5177 )
...
* fix
aaa
fix
fix
fix
* fix
* fix
* test ci
* fix ci
fix
* update pytorch version in documents
11 months ago
Yuanchen
3ff60d13b0
Fix ColossalEval ( #5186 )
...
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
11 months ago
flybird11111
79718fae04
[shardformer] llama support DistCrossEntropy ( #5176 )
...
* fix
aaa
fix
fix
fix
* fix
* fix
* test ci
* fix ci
fix
* llama support dist-cross
fix
fix
fix
fix
fix
fix
fix
fix
* fix
* fix
* fix
fix
* test ci
* test ci
* fix
* [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878 )
* Add finetuning Colossal-Llama-2 example
* Add finetuning Colossal-Llama-2 example 2
* Add finetuning Colossal-Llama-2 example and support NEFTuning
* Add inference example and refine neftune
* Modify readme file
* update the imports
---------
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>
* llama support dist-cross
fix
fix
fix
fix
fix
fix
fix
fix
* fix
* fix
* fix
fix
* test ci
* test ci
* fix
* fix ci
* fix ci
---------
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>
12 months ago