Hongxin Liu
a4cec1715b
[llama] add flash attn patch for npu ( #5362 )
2024-02-05 16:48:34 +08:00
Hongxin Liu
73f9f23fc6
[llama] update training script ( #5360 )
...
* [llama] update training script
* [doc] polish docstr
2024-02-05 16:33:18 +08:00
Hongxin Liu
6c0fa7b9a8
[llama] fix dataloader for hybrid parallel ( #5358 )
...
* [plugin] refactor prepare dataloader
* [plugin] update train script
2024-02-05 15:14:56 +08:00
Hongxin Liu
2dd01e3a14
[gemini] fix param op hook when output is tuple ( #5355 )
...
* [gemini] fix param op hook when output is tuple
* [gemini] fix param op hook
2024-02-04 11:58:26 +08:00
Wenhao Chen
1c790c0877
[fix] remove unnecessary dp_size assert ( #5351 )
...
* fix: remove unnecessary assert
* test: add more 3d plugin tests
* fix: add warning
2024-02-02 14:40:20 +08:00
Hongxin Liu
ffffc32dc7
[checkpointio] fix gemini and hybrid parallel optim checkpoint ( #5347 )
...
* [checkpointio] fix hybrid parallel optim checkpoint
* [extension] fix cuda extension
* [checkpointio] fix gemini optimizer checkpoint
* polish code
2024-02-01 16:13:06 +08:00
YeAnbang
c5239840e6
[Chat] fix sft loss nan ( #5345 )
...
* fix script
* fix script
* fix chat nan
* fix chat nan
2024-02-01 14:25:16 +08:00
Frank Lee
abd8e77ad8
[extension] fixed exception catch ( #5342 )
2024-01-31 18:09:49 +08:00
digger yu
71321a07cf
fix typo change dosen't to doesn't ( #5308 )
2024-01-30 09:57:38 +08:00
digger yu
6a3086a505
fix typo under extensions/ ( #5330 )
2024-01-30 09:55:16 +08:00
Frank Lee
febed23288
[doc] added docs for extensions ( #5324 )
...
* [doc] added docs for extensions
* polish
* polish
2024-01-29 17:39:23 +08:00
flybird11111
388179f966
[tests] fix t5 test. ( #5322 )
...
* [ci] fix shardformer tests. (#5255 )
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* fix t5 test
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-29 17:38:46 +08:00
Frank Lee
a6709afe66
Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api
...
[accelerator] fixed npu api
2024-01-29 14:29:58 +08:00
FrankLeeeee
087d0cb1fc
[accelerator] fixed npu api
2024-01-29 14:27:52 +08:00
Frank Lee
8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
...
Feature/npu
2024-01-29 13:49:39 +08:00
Frank Lee
73f4dc578e
[workflow] updated CI image ( #5318 )
2024-01-29 11:53:07 +08:00
Frank Lee
7cfed5f076
[feat] refactored extension module ( #5298 )
...
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
2024-01-25 17:01:48 +08:00
digger yu
bce9499ed3
fix some typo ( #5307 )
2024-01-25 13:56:27 +08:00
李文军
ec912b1ba9
[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style ( #5228 )
2024-01-25 13:14:48 +08:00
Desperado-Jia
ddf879e2db
fix bug for mefture ( #5299 )
2024-01-22 22:17:54 +08:00
Hongxin Liu
d7f8db8e21
[hotfix] fix 3d plugin test ( #5292 )
2024-01-22 15:19:04 +08:00
flybird11111
f7e3f82a7e
fix llama pretrain ( #5287 )
2024-01-19 17:49:02 +08:00
Desperado-Jia
6a56967855
[doc] add llama2-13B disyplay ( #5285 )
...
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-01-19 16:04:08 +08:00
Michelle
32cb74493a
fix auto loading gpt2 tokenizer ( #5279 )
2024-01-18 14:08:29 +08:00
Frank Lee
d66e6988bc
Merge pull request #5278 from ver217/sync/npu
...
[sync] sync npu branch with main
2024-01-18 13:11:45 +08:00
ver217
148469348a
Merge branch 'main' into sync/npu
2024-01-18 12:05:21 +08:00
Zhongkai Zhao
5d9a0ae75b
[hotfix] Fix ShardFormer test execution path when using sequence parallelism ( #5230 )
2024-01-17 17:42:29 +08:00
flybird11111
46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. ( #5246 )
...
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
2024-01-17 15:22:33 +08:00
flybird11111
2a0558d8ec
[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py ( #5276 )
...
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-17 13:38:55 +08:00
Frank Lee
d69cd2eb89
[workflow] fixed oom tests ( #5275 )
...
* [workflow] fixed oom tests
* polish
* polish
* polish
2024-01-16 18:55:13 +08:00
Frank Lee
04244aaaf1
[workflow] fixed incomplete bash command ( #5272 )
2024-01-16 11:54:44 +08:00
Wenhao Chen
ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg ( #5268 )
...
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
2024-01-15 15:57:40 +08:00
binmakeswell
c174c4fc5f
[doc] fix doc typo ( #5256 )
...
* [doc] fix annotation display
* [doc] fix llama2 doc
2024-01-11 21:01:11 +08:00
flybird11111
e830ef917d
[ci] fix shardformer tests. ( #5255 )
...
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-11 19:07:45 +08:00
digger yu
756c400ad2
fix typo in applications/ColossalEval/README.md ( #5250 )
2024-01-11 17:58:38 +08:00
Frank Lee
2b83418719
[ci] fixed ddp test ( #5254 )
...
* [ci] fixed ddp test
* polish
2024-01-11 17:16:32 +08:00
Frank Lee
d5eeeb1416
[ci] fixed booster test ( #5251 )
...
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
2024-01-11 16:04:45 +08:00
Frank Lee
edf94a35c3
[workflow] fixed build CI ( #5240 )
...
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
2024-01-10 22:34:16 +08:00
digger yu
41e52c1c6e
[doc] fix typo in Colossal-LLaMA-2/README.md ( #5247 )
2024-01-10 19:24:56 +08:00
Frank Lee
9102d655ab
[hotfix] removed unused flag ( #5242 )
2024-01-09 14:57:07 +08:00
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
Elsa Granger
d565df3821
[pipeline] A more general _communicate in p2p ( #5062 )
...
* A more general _communicate
* feat: finish tree_flatten version p2p
* fix: update p2p api calls
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-08 15:37:27 +08:00
Xuanlei Zhao
dd2c28a323
[npu] use extension for op builder ( #5172 )
...
* update extension
* update cpu adam
* update is
* add doc for cpu adam
* update kernel
* update commit
* update flash
* update memory efficient
* update flash attn
* update flash attention loader
* update api
* fix
* update doc
* update example time limit
* reverse change
* fix doc
* remove useless kernel
* fix
* not use warning
* update
* update
2024-01-08 11:39:16 +08:00
binmakeswell
7bc6969ce6
[doc] SwiftInfer release ( #5236 )
...
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
2024-01-08 09:55:12 +08:00
github-actions[bot]
4fb4a22a72
[format] applied code formatting on changed files in pull request 5234 ( #5235 )
...
Co-authored-by: github-actions <github-actions@github.com>
2024-01-07 20:55:34 +08:00
binmakeswell
b9b32b15e6
[doc] add Colossal-LLaMA-2-13B ( #5234 )
...
* [doc] add Colossal-LLaMA-2-13B
* [doc] add Colossal-LLaMA-2-13B
* [doc] add Colossal-LLaMA-2-13B
2024-01-07 20:53:12 +08:00
JIMMY ZHAO
ce651270f1
[doc] Make leaderboard format more uniform and good-looking ( #5231 )
...
* Make leaderboard format more unifeid and good-looking
* Update README.md
* Update README.md
2024-01-06 17:12:29 +08:00
Camille Zhong
915b4652f3
[doc] Update README.md of Colossal-LLAMA2 ( #5233 )
...
* Update README.md
* Update README.md
2024-01-06 17:06:41 +08:00
Tong Li
d992b55968
[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model ( #5224 )
...
* update readme
* update readme
* update link
* update
* update readme
* update
* update
* update
* update title
* update example
* update example
* fix content
* add conclusion
* add license
* update
* update
* update version
* fix minor
2024-01-05 17:24:26 +08:00
digger yu
b0b53a171c
[nfc] fix typo colossalai/shardformer/ ( #5133 )
2024-01-04 16:21:55 +08:00