Commit Graph

13 Commits (colossalchat_upgrade)

Author SHA1 Message Date
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
Camille Zhong 89049b0d89
[doc] fix ColossalMoE readme (#5599)
* fix readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-15 18:06:18 +08:00
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
* [devops] remove post commit ci

* [misc] run pre-commit on all files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-08 15:09:40 +08:00
digger yu a799ca343b
[fix] fix typo s/muiti-node /multi-node etc. (#5448) 2024-04-07 18:42:15 +08:00
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`

* feat: apply `GradientCheckpointConfig` to policy and llama_forward

* feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager

* fix: add optional args for `distribute_layer` and `get_stage_index`

* fix: fix changed API calls

* test: update llama tests

* style: polish `GradientCheckpointConfig`

* fix: fix pipeline utils tests
2024-04-01 11:34:58 +08:00
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189)
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution

* Change static methods for t5 layer distribution to member functions

* Change static methods for whisper layer distribution to member functions

* Replace whisper policy usage with self one

* Fix test case to use non-static layer distribution methods

* fix: fix typo

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-03-27 13:57:00 +08:00
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404)
* fix: simplify merge_batch

* fix: use return_outputs=False to eliminate extra memory consumption

* feat: add return_outputs warning

* style: remove `return_outputs=False` as it is the default value
2024-03-25 12:31:09 +08:00
Hongxin Liu 65e5d6baa5 [moe] fix mixtral optim checkpoint (#5344) 2024-02-07 19:21:02 +08:00
Hongxin Liu 956b561b54 [moe] fix mixtral forward default value (#5329) 2024-02-07 19:21:02 +08:00
Hongxin Liu b60be18dcc [moe] fix mixtral checkpoint io (#5314) 2024-02-07 19:21:02 +08:00
Hongxin Liu da39d21b71 [moe] support mixtral (#5309)
* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer
2024-02-07 19:21:02 +08:00
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253)
* [moe] top2 allow uneven input

* [moe] update capacity computing

* [moe] remove debug info

* [moe] update capacity computing

* [moe] update capacity computing
2024-02-07 19:21:02 +08:00
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 2024-02-07 19:21:02 +08:00