mirror of https://github.com/InternLM/InternLM
feat(doc): add GPU memory info for 7B & 20B models (#507)
* unitest_only_forward * memory_test * doc fix * doc fixpull/512/head
parent
f5aea7e08c
commit
7776693373
|
@ -91,3 +91,67 @@ When `Activation Ckpt` is turned off, the test results are as shown in the table
|
|||
<img src="../imgs/flops.png" width="580"/>
|
||||
</div>
|
||||
|
||||
|
||||
### GPU Memory Usage Test
|
||||
Test configuration:
|
||||
| Configuration | Description |
|
||||
| :-------: | :-----: |
|
||||
| branch | develop |
|
||||
| tag | v0.2.1dev20231121 |
|
||||
| GPU | A800 |
|
||||
| Checkpoint| True |
|
||||
| micro_bsz | 1 |
|
||||
| micro_num | 4 |
|
||||
| dtype | bfloat16|
|
||||
|
||||
```python
|
||||
# InternLM/configs/7B_sft.py
|
||||
data = dict(
|
||||
# micro_num means the number of micro_batch contained in one gradient update
|
||||
micro_num=4,
|
||||
# packed_length = micro_bsz * SEQ_LEN
|
||||
micro_bsz=1,
|
||||
...
|
||||
)
|
||||
|
||||
model = dict(
|
||||
checkpoint=True,
|
||||
dtype="torch.bfloat16",
|
||||
...
|
||||
)
|
||||
|
||||
parallel = dict(
|
||||
zero1=dict(size=8, fsdp=False),
|
||||
tensor=1,
|
||||
pipeline=dict(size=1, interleaved_overlap=True),
|
||||
sequence_parallel=False,
|
||||
)
|
||||
```
|
||||
|
||||
Pre-training & Fine-tuning test:
|
||||
|model|Number of GPU|zero1|tp|pp|fsdp|GPU Memory (GB)|
|
||||
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||
| 7B | 3 | -1 | 1 | 3 |False| 75 |
|
||||
| 7B | 3 | -1 | 1 | 1 |True | 72 |
|
||||
| 7B | 4 | -1 | 4 | 1 |True | 52 |
|
||||
| 7B | 4 | -1 | 4 | 1 |False| 61 |
|
||||
| 7B | 4 | -1 | 1 | 4 |False| 69 |
|
||||
| 7B | 4 | -1 | 1 | 1 |True | 56 |
|
||||
| 7B | 5 | -1 | 1 | 1 |True | 49 |
|
||||
| 7B | 5 | -1 | 1 | 5 |False| 62 |
|
||||
| 7B | 6 | -1 | 1 | 1 |True | 39 |
|
||||
| 7B | 6 | -1 | 2 | 1 |True | 38 |
|
||||
| 7B | 6 | -1 | 1 | 6 |False| 56 |
|
||||
| 20B | 8 | -1 | 1 | 1 |True | 78 |
|
||||
| 20B | 8 | -1 | 8 | 1 |True | 71 |
|
||||
| 20B | 16 | -1 | 1 | 1 |True | 40 |
|
||||
| 20B | 16 | -1 | 8 | 1 |True | 39 |
|
||||
| 20B | 16 | -1 | 1 | 16 |False| 52 |
|
||||
|
||||
|
||||
Web_demo test:
|
||||
|
||||
|model|GPU|GPU Memory (GB)|System Memory (MB)|
|
||||
|:-:|:-:|:-:|:-:|
|
||||
| 7B | A800 | 14.5 | 2465 |
|
||||
| 20B | A800 | 39 | 9547 |
|
||||
|
|
|
@ -88,3 +88,66 @@ InternLM中`zero1`的配置决定了优化器状态的分配范围。
|
|||
<img src="../doc/imgs/flops.png" width="580"/>
|
||||
</div>
|
||||
|
||||
### 显存占用测试
|
||||
测试配置:
|
||||
| 配置 | 描述 |
|
||||
| :-------: | :-----: |
|
||||
| 分支 | develop |
|
||||
| tag | v0.2.1dev20231121 |
|
||||
| 显卡 | A800 |
|
||||
| 重计算 | True |
|
||||
| micro_bsz | 1 |
|
||||
| micro_num | 4 |
|
||||
| dtype | bfloat16|
|
||||
|
||||
```python
|
||||
# InternLM/configs/7B_sft.py
|
||||
data = dict(
|
||||
# micro_num means the number of micro_batch contained in one gradient update
|
||||
micro_num=4,
|
||||
# packed_length = micro_bsz * SEQ_LEN
|
||||
micro_bsz=1,
|
||||
...
|
||||
)
|
||||
|
||||
model = dict(
|
||||
checkpoint=True,
|
||||
dtype="torch.bfloat16",
|
||||
...
|
||||
)
|
||||
|
||||
parallel = dict(
|
||||
zero1=dict(size=8, fsdp=False),
|
||||
tensor=1,
|
||||
pipeline=dict(size=1, interleaved_overlap=True),
|
||||
sequence_parallel=False,
|
||||
)
|
||||
```
|
||||
|
||||
预训练和微调测试:
|
||||
|模型|卡数|zero1|tp|pp|fsdp|显存(GB)|
|
||||
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||
| 7B | 3 | -1 | 1 | 3 |False| 75 |
|
||||
| 7B | 3 | -1 | 1 | 1 |True | 72 |
|
||||
| 7B | 4 | -1 | 4 | 1 |True | 52 |
|
||||
| 7B | 4 | -1 | 4 | 1 |False| 61 |
|
||||
| 7B | 4 | -1 | 1 | 4 |False| 69 |
|
||||
| 7B | 4 | -1 | 1 | 1 |True | 56 |
|
||||
| 7B | 5 | -1 | 1 | 1 |True | 49 |
|
||||
| 7B | 5 | -1 | 1 | 5 |False| 62 |
|
||||
| 7B | 6 | -1 | 1 | 1 |True | 39 |
|
||||
| 7B | 6 | -1 | 2 | 1 |True | 38 |
|
||||
| 7B | 6 | -1 | 1 | 6 |False| 56 |
|
||||
| 20B | 8 | -1 | 1 | 1 |True | 78 |
|
||||
| 20B | 8 | -1 | 8 | 1 |True | 71 |
|
||||
| 20B | 16 | -1 | 1 | 1 |True | 40 |
|
||||
| 20B | 16 | -1 | 8 | 1 |True | 39 |
|
||||
| 20B | 16 | -1 | 1 | 16 |False| 52 |
|
||||
|
||||
|
||||
Web_demo 测试:
|
||||
|
||||
|模型|显卡|显存(GB)|内存(MB)|
|
||||
|:-:|:-:|:-:|:-:|
|
||||
| 7B | A800 | 14.5 | 2465 |
|
||||
| 20B | A800 | 39 | 9547 |
|
Loading…
Reference in New Issue