Commit Graph

2967 Commits (feat/moe)

Author SHA1 Message Date
Tong Li 1d96a562bb update 2024-01-11 14:05:44 +08:00
Tong Li dac240563c minor update 2024-01-10 11:12:09 +08:00
Tong Li ea088b5f75 update train code 2024-01-10 10:42:37 +08:00
Tong Li 4b7f273022 add moe 2024-01-09 11:59:38 +08:00
ver217 63ee6fffe6 Merge branch 'main' into exp/mixtral 2024-01-08 16:43:54 +08:00
ver217 ce1cff26bd Merge branch 'main' into exp/mixtral 2024-01-08 16:42:00 +08:00
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062)
* A more general _communicate

* feat: finish tree_flatten version p2p

* fix: update p2p api calls

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-08 15:37:27 +08:00
binmakeswell 7bc6969ce6
[doc] SwiftInfer release (#5236)
* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
2024-01-08 09:55:12 +08:00
github-actions[bot] 4fb4a22a72
[format] applied code formatting on changed files in pull request 5234 (#5235)
Co-authored-by: github-actions <github-actions@github.com>
2024-01-07 20:55:34 +08:00
binmakeswell b9b32b15e6
[doc] add Colossal-LLaMA-2-13B (#5234)
* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B
2024-01-07 20:53:12 +08:00
JIMMY ZHAO ce651270f1
[doc] Make leaderboard format more uniform and good-looking (#5231)
* Make leaderboard format more unifeid and good-looking

* Update README.md

* Update README.md
2024-01-06 17:12:29 +08:00
Camille Zhong 915b4652f3
[doc] Update README.md of Colossal-LLAMA2 (#5233)
* Update README.md

* Update README.md
2024-01-06 17:06:41 +08:00
Tong Li d992b55968
[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224)
* update readme

* update readme

* update link

* update

* update readme

* update

* update

* update

* update title

* update example

* update example

* fix content

* add conclusion

* add license

* update

* update

* update version

* fix minor
2024-01-05 17:24:26 +08:00
Wenhao Chen 196b85368b [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
* fix: add fallback order option and update 1f1b

* fix: fix deadlock comm in interleaved pp

* test: modify p2p test
2024-01-05 14:01:54 +08:00
Wenhao Chen 931d0e0731 [pipeline]: support arbitrary batch size in forward_only mode (#5201)
* fix: remove drop last in val & test dataloader

* feat: add run_forward_only, support arbitrary bs

* chore: modify ci script
2024-01-05 14:01:39 +08:00
Wenhao Chen 1810b9100f [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
* test: add more p2p tests

* fix: remove send_forward_recv_forward as p2p op list need to use the same group

* fix: make send and receive atomic

* feat: update P2PComm fn

* feat: add metadata cache in 1f1b

* feat: add metadata cache in interleaved pp

* feat: modify is_xx_stage fn

* revert: add _broadcast_object_list

* feat: add interleaved pp in llama policy

* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2024-01-05 13:58:53 +08:00
digger yu b0b53a171c
[nfc] fix typo colossalai/shardformer/ (#5133) 2024-01-04 16:21:55 +08:00
Xuanlei Zhao 6b69f3085b update 2024-01-03 15:37:59 +08:00
flybird11111 451e9142b8
fix flash attn (#5209) 2024-01-03 14:39:53 +08:00
flybird11111 365671be10
fix-test (#5210)
fix-test

fix-test
2024-01-03 14:26:13 +08:00
Xuanlei Zhao 8ca8cf8ec3 update optim 2024-01-03 11:57:23 +08:00
Hongxin Liu 7f3400b560
[devops] update torch versoin in ci (#5217) 2024-01-03 11:46:33 +08:00
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
* fix: add fallback order option and update 1f1b

* fix: fix deadlock comm in interleaved pp

* test: modify p2p test
2024-01-03 11:34:49 +08:00
Wenhao Chen 3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode (#5201)
* fix: remove drop last in val & test dataloader

* feat: add run_forward_only, support arbitrary bs

* chore: modify ci script
2024-01-02 23:41:12 +08:00
Xuanlei Zhao f037583bd2 update train 2024-01-02 14:01:58 +08:00
flybird11111 02d2328a04
support linear accumulation fusion (#5199)
support linear accumulation fusion

support linear accumulation fusion

fix
2023-12-29 18:22:42 +08:00
Xuanlei Zhao 0b8c33f474 update 2023-12-29 18:20:32 +08:00
Xuanlei Zhao c1c6af6368 update 2023-12-29 18:09:28 +08:00
Xuanlei Zhao 0bb317d9e6 update 2023-12-29 17:28:46 +08:00
Xuanlei Zhao ccad7014c6 update optim 2023-12-29 16:51:29 +08:00
Xuanlei Zhao 44014faa67 fix optim 2023-12-28 21:58:08 +08:00
Xuanlei Zhao 0a3aae509b update utils and fwd bwd 2023-12-28 18:54:56 +08:00
Xuanlei Zhao a5580e6289 update test 2023-12-28 18:52:37 +08:00
Xuanlei Zhao 73aa406b96 update 2023-12-28 15:48:04 +08:00
Zhongkai Zhao 64519eb830
[doc] Update required third-party library list for testing and torch comptibility checking (#5207)
* doc/update requirements-test.txt

* update torch-cuda compatibility check
2023-12-27 18:03:45 +08:00
Xuanlei Zhao 570f5cd693 update pytest 2023-12-27 16:05:00 +08:00
Xuanlei Zhao 54b197cc02 update readme 2023-12-26 17:39:38 +08:00
Xuanlei Zhao 4922641098 script 2023-12-26 17:33:32 +08:00
Xuanlei Zhao d660a41850 update 2023-12-26 17:32:59 +08:00
Xuanlei Zhao b8fadb68a7 add pad 2023-12-25 17:02:05 +08:00
Xuanlei Zhao 23341687ed update 2023-12-25 16:29:47 +08:00
Xuanlei Zhao aa2e091dc6 update 2023-12-25 16:05:42 +08:00
Yuanchen eae01b6740
Improve logic for selecting metrics (#5196)
Co-authored-by: Xu <yuanchen.xu00@gmail.com>
2023-12-22 14:52:50 +08:00
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
* test: add more p2p tests

* fix: remove send_forward_recv_forward as p2p op list need to use the same group

* fix: make send and receive atomic

* feat: update P2PComm fn

* feat: add metadata cache in 1f1b

* feat: add metadata cache in interleaved pp

* feat: modify is_xx_stage fn

* revert: add _broadcast_object_list

* feat: add interleaved pp in llama policy

* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2023-12-22 10:44:00 +08:00
BlueRum af952673f7
polish readme in application/chat (#5194) 2023-12-20 11:28:39 +08:00
Xuanlei Zhao 7c5b1a585f update 2023-12-18 10:37:07 +08:00
flybird11111 681d9b12ef
[doc] update pytorch version in documents. (#5177)
* fix

aaa

fix

fix

fix

* fix

* fix

* test ci

* fix ci

fix

* update pytorch version in documents
2023-12-15 18:16:48 +08:00
Xuanlei Zhao ebd8cc579a update script 2023-12-15 16:38:51 +08:00
Xuanlei Zhao f66469e209 update 2023-12-15 16:32:32 +08:00
Yuanchen 3ff60d13b0
Fix ColossalEval (#5186)
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
2023-12-15 15:06:06 +08:00