mirror of https://github.com/hpcaitech/ColossalAI
[example] GPT polish readme (#2274)
parent
9654df0e9a
commit
879df8b943
|
@ -25,10 +25,10 @@ pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --e
|
|||
pip install colossalai==0.1.12+torch1.12cu11.3 -f https://release.colossalai.org
|
||||
```
|
||||
|
||||
### Install transformers
|
||||
### Install requirements
|
||||
|
||||
```bash
|
||||
pip install transformers
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
This is just an example that we download PyTorch=1.12.0, CUDA=11.6 and colossalai=0.1.12+torch1.12cu11.3. You can download another version of PyTorch and its corresponding ColossalAI version. Just make sure that the version of ColossalAI is at least 0.1.10, PyTorch is at least 1.8.1 and transformers is at least 4.231.
|
||||
|
@ -39,19 +39,16 @@ If you want to test ZeRO1 and ZeRO2 in Colossal-AI, you need to ensure Colossal-
|
|||
For simplicity, the input data is randonly generated here.
|
||||
|
||||
## Training
|
||||
We provide two solutions. One utilizes the hybrid parallel strategies of Gemini, DDP/ZeRO, and Tensor Parallelism.
|
||||
The other one uses Pipeline Parallelism Only.
|
||||
In the future, we are going merge them together and they can be used orthogonally to each other.
|
||||
|
||||
### GeminiDPP/ZeRO + Tensor Parallelism
|
||||
```bash
|
||||
bash run.sh
|
||||
bash run_gemini.sh
|
||||
```
|
||||
|
||||
Pipeline Parallel
|
||||
```bash
|
||||
bash run_pp.sh
|
||||
```
|
||||
|
||||
### Training config
|
||||
|
||||
The `train_gpt_demo.py` provides three distributed plans, you can choose the plan you want in `run.sh`. The Colossal-AI leverages Tensor Parallel and Gemini + ZeRO DDP.
|
||||
The `train_gpt_demo.py` provides three distributed plans, you can choose the plan you want in `run_gemini.sh`. The Colossal-AI leverages Tensor Parallel and Gemini + ZeRO DDP.
|
||||
|
||||
- Colossal-AI
|
||||
- ZeRO1 (Colossal-AI)
|
||||
|
@ -60,6 +57,12 @@ The `train_gpt_demo.py` provides three distributed plans, you can choose the pla
|
|||
- Pytorch ZeRO
|
||||
|
||||
|
||||
### Pipeline Parallel
|
||||
```bash
|
||||
bash run_pp.sh
|
||||
```
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
Testbed: a cluster of 8xA100 (80GB) and 1xAMD EPYC 7543 32-Core Processor (512 GB). GPUs are connected via PCI-e.
|
||||
|
|
|
@ -12,7 +12,7 @@ then
|
|||
fi
|
||||
echo "****************** Begin ***************************"
|
||||
echo "* benchmrking MODEL_TYPE ${MODEL_TYPE} BS ${BATCH_SIZE} BS ${BS} GPUNUM ${GPUNUM} TPDEGREE ${TPDEGREE}"
|
||||
MODEL_TYPE=${MODEL_TYPE} BATCH_SIZE=${BATCH_SIZE} GPUNUM=${GPUNUM} TPDEGREE=${TPDEGREE} bash ./run.sh
|
||||
MODEL_TYPE=${MODEL_TYPE} BATCH_SIZE=${BATCH_SIZE} GPUNUM=${GPUNUM} TPDEGREE=${TPDEGREE} bash ./run_gemini.sh
|
||||
echo "****************** Finished ***************************"
|
||||
echo ""
|
||||
echo ""
|
|
@ -9,5 +9,5 @@ export USE_SHARD_INIT=${USE_SHARD_INIT:-False}
|
|||
export BATCH_SIZE=${BATCH_SIZE:-16}
|
||||
export MODEL_TYPE=${MODEL_TYPE:-"gpt2_medium"}
|
||||
|
||||
mkdir -p logs
|
||||
torchrun --standalone --nproc_per_node=${GPUNUM} train_gpt_demo.py --tp_degree=${TPDEGREE} --model_type=${MODEL_TYPE} --batch_size=${BATCH_SIZE} --placement ${PLACEMENT} --shardinit ${USE_SHARD_INIT} --distplan ${DISTPAN} 2>&1 | tee ./logs/${MODEL_TYPE}_${DISTPAN}_gpu_${GPUNUM}_bs_${BATCH_SIZE}_tp_${TPDEGREE}.log
|
||||
mkdir -p gemini_logs
|
||||
torchrun --standalone --nproc_per_node=${GPUNUM} train_gpt_demo.py --tp_degree=${TPDEGREE} --model_type=${MODEL_TYPE} --batch_size=${BATCH_SIZE} --placement ${PLACEMENT} --shardinit ${USE_SHARD_INIT} --distplan ${DISTPAN} 2>&1 | tee ./gemini_logs/${MODEL_TYPE}_${DISTPAN}_gpu_${GPUNUM}_bs_${BATCH_SIZE}_tp_${TPDEGREE}.log
|
|
@ -3,5 +3,5 @@ export BATCH_SIZE=${BATCH_SIZE:-16}
|
|||
export MODEL_TYPE=${MODEL_TYPE:-"gpt2_medium"}
|
||||
export NUM_MICROBATCH=${NUM_MICROBATCH:-4}
|
||||
|
||||
mkdir -p logs
|
||||
python train_gpt_pp_demo.py --device="cuda" --model_type=${MODEL_TYPE} --num_microbatches=${NUM_MICROBATCH} --world_size=${GPUNUM} --batch_size=${BATCH_SIZE} 2>&1 | tee ./logs/${MODEL_TYPE}_gpu_${GPUNUM}_bs_${BATCH_SIZE}_nm_${NUM_MICROBATCH}.log
|
||||
mkdir -p pp_logs
|
||||
python train_gpt_pp_demo.py --device="cuda" --model_type=${MODEL_TYPE} --num_microbatches=${NUM_MICROBATCH} --world_size=${GPUNUM} --batch_size=${BATCH_SIZE} 2>&1 | tee ./pp_logs/${MODEL_TYPE}_gpu_${GPUNUM}_bs_${BATCH_SIZE}_nm_${NUM_MICROBATCH}.log
|
||||
|
|
Loading…
Reference in New Issue