Tong Li
1d96a562bb
update
11 months ago
Tong Li
dac240563c
minor update
11 months ago
Tong Li
ea088b5f75
update train code
11 months ago
Tong Li
4b7f273022
add moe
11 months ago
ver217
63ee6fffe6
Merge branch 'main' into exp/mixtral
11 months ago
ver217
ce1cff26bd
Merge branch 'main' into exp/mixtral
11 months ago
Elsa Granger
d565df3821
[pipeline] A more general _communicate in p2p ( #5062 )
...
* A more general _communicate
* feat: finish tree_flatten version p2p
* fix: update p2p api calls
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
11 months ago
binmakeswell
7bc6969ce6
[doc] SwiftInfer release ( #5236 )
...
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
* [doc] SwiftInfer release
11 months ago
github-actions[bot]
4fb4a22a72
[format] applied code formatting on changed files in pull request 5234 ( #5235 )
...
Co-authored-by: github-actions <github-actions@github.com>
11 months ago
binmakeswell
b9b32b15e6
[doc] add Colossal-LLaMA-2-13B ( #5234 )
...
* [doc] add Colossal-LLaMA-2-13B
* [doc] add Colossal-LLaMA-2-13B
* [doc] add Colossal-LLaMA-2-13B
11 months ago
JIMMY ZHAO
ce651270f1
[doc] Make leaderboard format more uniform and good-looking ( #5231 )
...
* Make leaderboard format more unifeid and good-looking
* Update README.md
* Update README.md
11 months ago
Camille Zhong
915b4652f3
[doc] Update README.md of Colossal-LLAMA2 ( #5233 )
...
* Update README.md
* Update README.md
11 months ago
Tong Li
d992b55968
[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model ( #5224 )
...
* update readme
* update readme
* update link
* update
* update readme
* update
* update
* update
* update title
* update example
* update example
* fix content
* add conclusion
* add license
* update
* update
* update version
* fix minor
11 months ago
Wenhao Chen
196b85368b
[pipeline]: add p2p fallback order and fix interleaved pp deadlock ( #5214 )
...
* fix: add fallback order option and update 1f1b
* fix: fix deadlock comm in interleaved pp
* test: modify p2p test
11 months ago
Wenhao Chen
931d0e0731
[pipeline]: support arbitrary batch size in forward_only mode ( #5201 )
...
* fix: remove drop last in val & test dataloader
* feat: add run_forward_only, support arbitrary bs
* chore: modify ci script
11 months ago
Wenhao Chen
1810b9100f
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp ( #5134 )
...
* test: add more p2p tests
* fix: remove send_forward_recv_forward as p2p op list need to use the same group
* fix: make send and receive atomic
* feat: update P2PComm fn
* feat: add metadata cache in 1f1b
* feat: add metadata cache in interleaved pp
* feat: modify is_xx_stage fn
* revert: add _broadcast_object_list
* feat: add interleaved pp in llama policy
* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
11 months ago
digger yu
b0b53a171c
[nfc] fix typo colossalai/shardformer/ ( #5133 )
11 months ago
Xuanlei Zhao
6b69f3085b
update
11 months ago
flybird11111
451e9142b8
fix flash attn ( #5209 )
11 months ago
flybird11111
365671be10
fix-test ( #5210 )
...
fix-test
fix-test
11 months ago
Xuanlei Zhao
8ca8cf8ec3
update optim
11 months ago
Hongxin Liu
7f3400b560
[devops] update torch versoin in ci ( #5217 )
11 months ago
Wenhao Chen
d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock ( #5214 )
...
* fix: add fallback order option and update 1f1b
* fix: fix deadlock comm in interleaved pp
* test: modify p2p test
11 months ago
Wenhao Chen
3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode ( #5201 )
...
* fix: remove drop last in val & test dataloader
* feat: add run_forward_only, support arbitrary bs
* chore: modify ci script
11 months ago
Xuanlei Zhao
f037583bd2
update train
11 months ago
flybird11111
02d2328a04
support linear accumulation fusion ( #5199 )
...
support linear accumulation fusion
support linear accumulation fusion
fix
11 months ago
Xuanlei Zhao
0b8c33f474
update
11 months ago
Xuanlei Zhao
c1c6af6368
update
11 months ago
Xuanlei Zhao
0bb317d9e6
update
11 months ago
Xuanlei Zhao
ccad7014c6
update optim
11 months ago
Xuanlei Zhao
44014faa67
fix optim
11 months ago
Xuanlei Zhao
0a3aae509b
update utils and fwd bwd
11 months ago
Xuanlei Zhao
a5580e6289
update test
11 months ago
Xuanlei Zhao
73aa406b96
update
11 months ago
Zhongkai Zhao
64519eb830
[doc] Update required third-party library list for testing and torch comptibility checking ( #5207 )
...
* doc/update requirements-test.txt
* update torch-cuda compatibility check
11 months ago
Xuanlei Zhao
570f5cd693
update pytest
11 months ago
Xuanlei Zhao
54b197cc02
update readme
11 months ago
Xuanlei Zhao
4922641098
script
11 months ago
Xuanlei Zhao
d660a41850
update
11 months ago
Xuanlei Zhao
b8fadb68a7
add pad
11 months ago
Xuanlei Zhao
23341687ed
update
11 months ago
Xuanlei Zhao
aa2e091dc6
update
11 months ago
Yuanchen
eae01b6740
Improve logic for selecting metrics ( #5196 )
...
Co-authored-by: Xu <yuanchen.xu00@gmail.com>
11 months ago
Wenhao Chen
4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp ( #5134 )
...
* test: add more p2p tests
* fix: remove send_forward_recv_forward as p2p op list need to use the same group
* fix: make send and receive atomic
* feat: update P2PComm fn
* feat: add metadata cache in 1f1b
* feat: add metadata cache in interleaved pp
* feat: modify is_xx_stage fn
* revert: add _broadcast_object_list
* feat: add interleaved pp in llama policy
* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
11 months ago
BlueRum
af952673f7
polish readme in application/chat ( #5194 )
11 months ago
Xuanlei Zhao
7c5b1a585f
update
11 months ago
flybird11111
681d9b12ef
[doc] update pytorch version in documents. ( #5177 )
...
* fix
aaa
fix
fix
fix
* fix
* fix
* test ci
* fix ci
fix
* update pytorch version in documents
11 months ago
Xuanlei Zhao
ebd8cc579a
update script
11 months ago
Xuanlei Zhao
f66469e209
update
11 months ago
Yuanchen
3ff60d13b0
Fix ColossalEval ( #5186 )
...
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
11 months ago