mirror of https://github.com/hpcaitech/ColossalAI
aibig-modeldata-parallelismdeep-learningdistributed-computingfoundation-modelsheterogeneous-traininghpcinferencelarge-scalemodel-parallelismpipeline-parallelism
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Hongxin Liu
7f8b16635b
|
7 months ago | |
---|---|---|
.. | ||
colossal_moe | ||
tests | 7 months ago | |
README.md | ||
infer.py | 7 months ago | |
infer.sh | ||
requirements.txt | ||
setup.py | ||
train.py | 7 months ago | |
train.sh | ||
version.txt |
README.md
Mixtral
Usage
1. Installation
Please install the latest ColossalAI from source.
CUDA_EXT=1 pip install -U git+https://github.com/hpcaitech/ColossalAI
Then install dependencies.
cd ColossalAI/applications/ColossalMoE
pip install -e .
Additionally, we recommend you to use torch 1.13.1. We've tested our code on torch 1.13.1 and found it's compatible with our code.
2. Inference
Yon can use colossalai run to launch inference:
bash infer.sh
If you already have downloaded model weights, you can change name to your weights position in infer.sh
.
3. Train
You first need to create ./hostfile
, listing the ip address of all your devices, such as:
111.111.111.110
111.111.111.111
Then yon can use colossalai run to launch train:
bash train.sh
It requires 16 H100 (80G) to run the training. The number of GPUs should be divided by 8. If you already have downloaded model weights, you can change name to your weights position in train.sh
.