mirror of https://github.com/InternLM/InternLM
fix/fix_submodule_err (#61)
* fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>pull/65/head
parent
c7287e2584
commit
6150e4daed
|
@ -8,7 +8,8 @@ The required packages and corresponding version are shown as follows:
|
||||||
- CUDA == 11.7
|
- CUDA == 11.7
|
||||||
- Pytorch == 1.13.1+cu117
|
- Pytorch == 1.13.1+cu117
|
||||||
- Transformers >= 4.25.1
|
- Transformers >= 4.25.1
|
||||||
- Flash-Attention == 23.05
|
- Flash-Attention == v1.0.5
|
||||||
|
- Apex == 23.05
|
||||||
- GPU with Ampere or Hopper architecture (such as H100, A100)
|
- GPU with Ampere or Hopper architecture (such as H100, A100)
|
||||||
- Linux OS
|
- Linux OS
|
||||||
|
|
||||||
|
|
|
@ -192,7 +192,7 @@ $ srun -p internllm -N 2 -n 16 --ntasks-per-node=8 --gpus-per-task=1 python trai
|
||||||
If you want to start distributed training on torch with 8 GPUs on a single node, use the following command:
|
If you want to start distributed training on torch with 8 GPUs on a single node, use the following command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py
|
$ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py --launcher "torch"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Training Results
|
### Training Results
|
||||||
|
|
|
@ -8,7 +8,8 @@
|
||||||
- CUDA == 11.7
|
- CUDA == 11.7
|
||||||
- Pytorch == 1.13.1+cu117
|
- Pytorch == 1.13.1+cu117
|
||||||
- Transformers >= 4.25.1
|
- Transformers >= 4.25.1
|
||||||
- Flash-Attention == 23.05
|
- Flash-Attention == v1.0.5
|
||||||
|
- Apex == 23.05
|
||||||
- Ampere或者Hopper架构的GPU (例如H100, A100)
|
- Ampere或者Hopper架构的GPU (例如H100, A100)
|
||||||
- Linux OS
|
- Linux OS
|
||||||
|
|
||||||
|
|
|
@ -175,7 +175,7 @@ $ srun -p internllm -N 2 -n 16 --ntasks-per-node=8 --gpus-per-task=1 python trai
|
||||||
|
|
||||||
若在 torch 上启动分布式运行环境,单节点 8 卡的运行命令如下所示:
|
若在 torch 上启动分布式运行环境,单节点 8 卡的运行命令如下所示:
|
||||||
```bash
|
```bash
|
||||||
$ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py
|
$ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py --launcher "torch"
|
||||||
```
|
```
|
||||||
|
|
||||||
### 运行结果
|
### 运行结果
|
||||||
|
|
|
@ -1 +1 @@
|
||||||
Subproject commit 8ffc901e50bbf740fdb6d5bccb17f66a6ec8604e
|
Subproject commit 0da3ffb92ee6fbe5336602f0e3989db1cd16f880
|
|
@ -1 +1 @@
|
||||||
Subproject commit d2f4324f4c56e017fbf22dc421943793a8ca6c3b
|
Subproject commit eff9fe6b8076df59d64d7a3f464696738a3c7c24
|
Loading…
Reference in New Issue