ColossalAI/applications/ColossalMoE
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
..
colossal_moe [shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508) 2024-04-01 11:34:58 +08:00
tests [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00
README.md [doc] fix ColossalMoE readme (#5599) 2024-04-15 18:06:18 +08:00
infer.py [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00
infer.sh [moe] init mixtral impl 2024-02-07 19:21:02 +08:00
requirements.txt [moe] init mixtral impl 2024-02-07 19:21:02 +08:00
setup.py [moe] init mixtral impl 2024-02-07 19:21:02 +08:00
train.py [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00
train.sh [moe] init mixtral impl 2024-02-07 19:21:02 +08:00
version.txt [moe] init mixtral impl 2024-02-07 19:21:02 +08:00

README.md

Mixtral

Usage

1. Installation

Please install the latest ColossalAI from source.

CUDA_EXT=1 pip install -U git+https://github.com/hpcaitech/ColossalAI

Then install dependencies.

cd ColossalAI/applications/ColossalMoE
pip install -e .

Additionally, we recommend you to use torch 1.13.1. We've tested our code on torch 1.13.1 and found it's compatible with our code.

2. Inference

Yon can use colossalai run to launch inference:

bash infer.sh

If you already have downloaded model weights, you can change name to your weights position in infer.sh.

3. Train

You first need to create ./hostfile, listing the ip address of all your devices, such as:

111.111.111.110
111.111.111.111

Then yon can use colossalai run to launch train:

bash train.sh

It requires 16 H100 (80G) to run the training. The number of GPUs should be divided by 8. If you already have downloaded model weights, you can change name to your weights position in train.sh.