InternLM/doc/code-docs/source/tf32.rst

53 lines
2.5 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

TF32训练
==================
InternLM支持使用TF32训练模型。TensorFloat-32TF32是Nvidia在Ampere架构GPU上推出的专门运用于TensorCore的一种计算格式。其与其他常用数据格式的比较如下图
InternLM supports training models using TF32. TensorFloat-32 (TF32) is a computational format introduced by Nvidia for TensorCores on Ampere architecture GPUs. Here's a comparison of TF32 with other data formats:
.. figure:: ../../imgs/tf32.png
:scale: 50%
:class: with-border
使用TF32的前置条件
Prerequisites for using TF32:
input data must be of type FP32 (single-precision floating-point) and the computations should be matrix multiplication, convolution and so on.
1. 输入数据类型为FP32且计算为矩阵乘法及卷积相关运算才可以使用TF32作为TensorCore的中间计算类型。
Ampere GPU
2. Ampere架构的GPU。
值得注意的是TF32仅仅是在使用TensorCore时的一种中间计算格式并不是一个完全的数据类型。因此为了区分不同的精度与计算格式 ``BF16````FP16````FP32````TF32`` InternLM支持用户在 ``model config`` 中传入 ``torch.tf32`` 来表示想要使用TF32加速运算本质上数据类型依旧为 ``FP32``
It is noticed that TF32 is an intermediate calculation format when employing TensorCores. InternLM allows users to speficy ``torch.tf32`` in the model config to using TF32 acceleration while dtype is still ``torch.float32``.
.. code-block:: python
model = dict(
checkpoint=False, # The proportion of layers for activation aheckpointing, the optional value are True/False/[0-1]
num_attention_heads=NUM_ATTENTION_HEAD,
embed_split_hidden=True,
vocab_size=VOCAB_SIZE,
embed_grad_scale=1,
parallel_output=True,
hidden_size=HIDDEN_SIZE,
num_layers=NUM_LAYER,
mlp_ratio=MLP_RATIO,
apply_post_layer_norm=False,
dtype="torch.tf32", # Support: "torch.float16", "torch.half", "torch.bfloat16", "torch.float32", "torch.tf32"
norm_type="rmsnorm",
layer_norm_epsilon=1e-5,
use_flash_attn=True,
num_chunks=1, # if num_chunks > 1, interleaved pipeline scheduler is used.
)
InternLM会根据 ``model config`` 中的 ``dtype`` 字符串来判断真正的数据类型。InternLM通过设置以下变量来开启TF32训练。
InternLM enables TF32 training by setting the following variables.
.. code-block:: python
torch.backends.cudnn.allow_tf32 = True
torch.backends.cuda.matmul.allow_tf32 = True