From 54b197cc02f2b2a78e30897689ae56258a5271a7 Mon Sep 17 00:00:00 2001 From: Xuanlei Zhao Date: Tue, 26 Dec 2023 17:39:38 +0800 Subject: [PATCH] update readme --- applications/ColossalMoE/README.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/applications/ColossalMoE/README.md b/applications/ColossalMoE/README.md index 69db34db4..c3c214789 100644 --- a/applications/ColossalMoE/README.md +++ b/applications/ColossalMoE/README.md @@ -23,4 +23,17 @@ Additionally, we recommend you to use torch 1.13.1. We've tested our code on tor Yon can use colossalai run to launch inference: ```bash bash infer.sh -``` \ No newline at end of file +``` +If you already have downloaded model weights, you can change name to your weights position in `infer.sh`. + +### 3. Train +You first need to create `./hostfile`, listing the ip address of all your devices, such as: +```bash +111.111.111.110 +111.111.111.111 +``` +Then yon can use colossalai run to launch train: +```bash +bash train.sh +``` +It requires 16 H100 (80G) to run the training. The number of GPUs should be divided by 8. If you already have downloaded model weights, you can change name to your weights position in `train.sh`.