mirror of https://github.com/hpcaitech/ColossalAI
[doc] update and revise some typos and errs in docs (#4107)
* fix some typos and problems in doc * fix some typos and problems in doc * add doc testpull/4122/head
parent
769cddcb2c
commit
711e2b4c00
|
@ -1,31 +1,36 @@
|
|||
# Booster API
|
||||
Author: [Mingyan Jiang](https://github.com/jiangmingyan)
|
||||
|
||||
Author: [Mingyan Jiang](https://github.com/jiangmingyan) [Jianghai Chen](https://github.com/CjhHa1)
|
||||
|
||||
**Prerequisite:**
|
||||
|
||||
- [Distributed Training](../concepts/distributed_training.md)
|
||||
- [Colossal-AI Overview](../concepts/colossalai_overview.md)
|
||||
|
||||
**Example Code**
|
||||
|
||||
- [Train with Booster](https://github.com/hpcaitech/ColossalAI/blob/main/examples/tutorial/new_api/cifar_resnet/README.md)
|
||||
|
||||
## Introduction
|
||||
|
||||
In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
|
||||
|
||||
### Plugin
|
||||
|
||||
Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:
|
||||
|
||||
***GeminiPlugin:*** This plugin wraps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
|
||||
**_GeminiPlugin:_** This plugin wraps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
|
||||
|
||||
***TorchDDPPlugin:*** This plugin wraps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
|
||||
**_TorchDDPPlugin:_** This plugin wraps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
|
||||
|
||||
***LowLevelZeroPlugin:*** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
|
||||
**_LowLevelZeroPlugin:_** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
|
||||
|
||||
### API of booster
|
||||
|
||||
|
||||
{{ autodoc:colossalai.booster.Booster }}
|
||||
|
||||
## Usage
|
||||
|
||||
In a typical workflow, you should launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster APIs and these returned objects to continue the rest of your training processes.
|
||||
|
||||
A pseudo-code example is like below:
|
||||
|
@ -67,5 +72,4 @@ def train():
|
|||
|
||||
[more design details](https://github.com/hpcaitech/ColossalAI/discussions/3046)
|
||||
|
||||
|
||||
<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 booster_api.py -->
|
||||
|
|
|
@ -3,12 +3,13 @@
|
|||
Author: [Mingyan Jiang](https://github.com/jiangmingyan)
|
||||
|
||||
**Prerequisite**
|
||||
|
||||
- [Define Your Configuration](../basics/define_your_config.md)
|
||||
- [Training Booster](../basics/booster_api.md)
|
||||
|
||||
**Related Paper**
|
||||
- [Accelerating Scientific Computations with Mixed Precision Algorithms](https://arxiv.org/abs/0808.2794)
|
||||
|
||||
- [Accelerating Scientific Computations with Mixed Precision Algorithms](https://arxiv.org/abs/0808.2794)
|
||||
|
||||
## Introduction
|
||||
|
||||
|
@ -19,9 +20,8 @@ In Colossal-AI, we have incorporated different implementations of mixed precisio
|
|||
2. apex.amp
|
||||
3. naive amp
|
||||
|
||||
|
||||
| Colossal-AI | support tensor parallel | support pipeline parallel | fp16 extent |
|
||||
| ----------- | ----------------------- | ------------------------- | ----------- |
|
||||
| -------------- | ----------------------- | ------------------------- | ---------------------------------------------------------------------------------------------------- |
|
||||
| AMP_TYPE.TORCH | ✅ | ❌ | Model parameters, activation, gradients are downcast to fp16 during forward and backward propagation |
|
||||
| AMP_TYPE.APEX | ❌ | ❌ | More fine-grained, we can choose opt_level O0, O1, O2, O3 |
|
||||
| AMP_TYPE.NAIVE | ✅ | ✅ | Model parameters, forward and backward operations are all downcast to fp16 |
|
||||
|
@ -64,8 +64,11 @@ However, there are other operations, like reductions, which require the dynamic
|
|||
We supported three AMP training methods and allowed the user to train with AMP with no code. If you want to train with amp, just assign `mixed_precision` with `fp16` when you instantiate the `Booster`. Now booster support torch amp, the other two(apex amp, naive amp) are still started by `colossalai.initialize`, if needed, please refer to [this](./mixed_precision_training.md). Next we will support `bf16`, `fp8`.
|
||||
|
||||
### Start with Booster
|
||||
|
||||
instantiate `Booster` with `mixed_precision="fp16"`, then you can train with torch amp.
|
||||
|
||||
<!--- doc-test-ignore-start -->
|
||||
|
||||
```python
|
||||
"""
|
||||
Mapping:
|
||||
|
@ -78,9 +81,13 @@ instantiate `Booster` with `mixed_precision="fp16"`, then you can train with tor
|
|||
from colossalai import Booster
|
||||
booster = Booster(mixed_precision='fp16',...)
|
||||
```
|
||||
|
||||
<!--- doc-test-ignore-end -->
|
||||
|
||||
or you can create a `FP16TorchMixedPrecision` object, such as:
|
||||
|
||||
<!--- doc-test-ignore-start -->
|
||||
|
||||
```python
|
||||
from colossalai.mixed_precision import FP16TorchMixedPrecision
|
||||
mixed_precision = FP16TorchMixedPrecision(
|
||||
|
@ -90,9 +97,10 @@ mixed_precision = FP16TorchMixedPrecision(
|
|||
growth_interval=2000)
|
||||
booster = Booster(mixed_precision=mixed_precision,...)
|
||||
```
|
||||
<!--- doc-test-ignore-end -->
|
||||
The same goes for other types of amps.
|
||||
|
||||
<!--- doc-test-ignore-end -->
|
||||
|
||||
The same goes for other types of amps.
|
||||
|
||||
### Torch AMP Configuration
|
||||
|
||||
|
@ -121,7 +129,6 @@ The output model is converted to AMP model of smaller memory consumption.
|
|||
If your input model is already too large to fit in a GPU, please instantiate your model weights in `dtype=torch.float16`.
|
||||
Otherwise, try smaller models or checkout more parallelization training techniques!
|
||||
|
||||
|
||||
## Hands-on Practice
|
||||
|
||||
Now we will introduce the use of AMP with Colossal-AI. In this practice, we will use Torch AMP as an example.
|
||||
|
@ -248,4 +255,5 @@ Use the following command to start the training scripts. You can change `--nproc
|
|||
```shell
|
||||
colossalai run --nproc_per_node 1 train.py
|
||||
```
|
||||
|
||||
<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 mixed_precision_training_with_booster.py -->
|
||||
|
|
|
@ -7,19 +7,18 @@ can also run on systems with only one GPU. Quick demos showing how to use Coloss
|
|||
## Single GPU
|
||||
|
||||
Colossal-AI can be used to train deep learning models on systems with only one GPU and achieve baseline
|
||||
performances. We provided an example to [train ResNet on CIFAR10 dataset](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet)
|
||||
with only one GPU. You can find the example in [ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI-Examples).
|
||||
performances. We provided an example to [train ResNet on CIFAR10 dataset](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/resnet)
|
||||
with only one GPU. You can find the example in [ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples).
|
||||
Detailed instructions can be found in its `README.md`.
|
||||
|
||||
## Multiple GPUs
|
||||
|
||||
Colossal-AI can be used to train deep learning models on distributed systems with multiple GPUs and accelerate the
|
||||
training process drastically by applying efficient parallelization techniques. When we have several parallelism for you
|
||||
to try out.
|
||||
training process drastically by applying efficient parallelization techniques. When we have several parallelism for you to try out.
|
||||
|
||||
#### 1. data parallel
|
||||
|
||||
You can use the same [ResNet example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet) as the
|
||||
You can use the same [ResNet example](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/resnet) as the
|
||||
single-GPU demo above. By setting `--nproc_per_node` to be the number of GPUs you have on your machine, the example
|
||||
is turned into a data parallel example.
|
||||
|
||||
|
@ -27,17 +26,19 @@ is turned into a data parallel example.
|
|||
|
||||
Hybrid parallel includes data, tensor, and pipeline parallelism. In Colossal-AI, we support different types of tensor
|
||||
parallelism (i.e. 1D, 2D, 2.5D and 3D). You can switch between different tensor parallelism by simply changing the configuration
|
||||
in the `config.py`. You can follow the [GPT example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/gpt).
|
||||
in the `config.py`. You can follow the [GPT example](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/gpt).
|
||||
Detailed instructions can be found in its `README.md`.
|
||||
|
||||
#### 3. MoE parallel
|
||||
|
||||
We provided [an example of WideNet](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/widenet) to demonstrate
|
||||
We provided [an example of ViT-MoE](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/moe) to demonstrate
|
||||
MoE parallelism. WideNet uses mixture of experts (MoE) to achieve better performance. More details can be found in
|
||||
[Tutorial: Integrate Mixture-of-Experts Into Your Model](../advanced_tutorials/integrate_mixture_of_experts_into_your_model.md)
|
||||
|
||||
#### 4. sequence parallel
|
||||
|
||||
Sequence parallel is designed to tackle memory efficiency and sequence length limit problems in NLP tasks. We provided
|
||||
[an example of BERT](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/bert/sequene_parallel) in
|
||||
[ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI-Examples). You can follow the `README.md` to execute the code.
|
||||
[an example of BERT](https://github.com/hpcaitech/ColossalAI/tree/main/examples/tutorial/sequence_parallel) in
|
||||
[ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples). You can follow the `README.md` to execute the code.
|
||||
|
||||
<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 run_demo.py -->
|
||||
|
|
|
@ -1,28 +1,37 @@
|
|||
# booster 使用
|
||||
作者: [Mingyan Jiang](https://github.com/jiangmingyan)
|
||||
|
||||
作者: [Mingyan Jiang](https://github.com/jiangmingyan) [Jianghai Chen](https://github.com/CjhHa1)
|
||||
|
||||
**预备知识:**
|
||||
|
||||
- [分布式训练](../concepts/distributed_training.md)
|
||||
- [Colossal-AI 总览](../concepts/colossalai_overview.md)
|
||||
|
||||
**示例代码**
|
||||
|
||||
<!-- update this url-->
|
||||
|
||||
- [使用 booster 训练](https://github.com/hpcaitech/ColossalAI/blob/main/examples/tutorial/new_api/cifar_resnet/README.md)
|
||||
|
||||
## 简介
|
||||
在我们的新设计中, `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如,模型、优化器、数据加载器)无缝注入您的训练组件中。 使用booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
|
||||
|
||||
在我们的新设计中, `colossalai.booster` 代替 `colossalai.initialize` 将特征(例如,模型、优化器、数据加载器)无缝注入您的训练组件中。 使用 booster API, 您可以更友好地将我们的并行策略整合到待训练模型中. 调用 `colossalai.booster` 是您进入训练循环前的基本操作。
|
||||
在下面的章节中,我们将介绍 `colossalai.booster` 是如何工作的以及使用时我们要注意的细节。
|
||||
|
||||
### Booster 插件
|
||||
|
||||
Booster 插件是管理并行配置的重要组件(eg:gemini 插件封装了 gemini 加速方案)。目前支持的插件如下:
|
||||
|
||||
***GeminiPlugin:*** GeminiPlugin插件封装了 gemini 加速解决方案,即基于块内存管理的 ZeRO优化方案。
|
||||
**_GeminiPlugin:_** GeminiPlugin 插件封装了 gemini 加速解决方案,即基于块内存管理的 ZeRO 优化方案。
|
||||
|
||||
***TorchDDPPlugin:*** TorchDDPPlugin插件封装了DDP加速方案,实现了模型级别的数据并行,可以跨多机运行。
|
||||
**_TorchDDPPlugin:_** TorchDDPPlugin 插件封装了 DDP 加速方案,实现了模型级别的数据并行,可以跨多机运行。
|
||||
|
||||
***LowLevelZeroPlugin:*** LowLevelZeroPlugin插件封装了零冗余优化器的 1/2 阶段。阶段 1:切分优化器参数,分发到各并发进程或并发GPU上。阶段 2:切分优化器参数及梯度,分发到各并发进程或并发GPU上。
|
||||
**_LowLevelZeroPlugin:_** LowLevelZeroPlugin 插件封装了零冗余优化器的 1/2 阶段。阶段 1:切分优化器参数,分发到各并发进程或并发 GPU 上。阶段 2:切分优化器参数及梯度,分发到各并发进程或并发 GPU 上。
|
||||
|
||||
### Booster 接口
|
||||
|
||||
<!--TODO: update autodoc -->
|
||||
|
||||
{{ autodoc:colossalai.booster.Booster }}
|
||||
|
||||
## 使用方法及示例
|
||||
|
|
|
@ -3,12 +3,13 @@
|
|||
作者: [Mingyan Jiang](https://github.com/jiangmingyan)
|
||||
|
||||
**前置教程**
|
||||
|
||||
- [定义配置文件](../basics/define_your_config.md)
|
||||
- [booster 使用](../basics/booster_api.md)
|
||||
|
||||
**相关论文**
|
||||
- [Accelerating Scientific Computations with Mixed Precision Algorithms](https://arxiv.org/abs/0808.2794)
|
||||
|
||||
- [Accelerating Scientific Computations with Mixed Precision Algorithms](https://arxiv.org/abs/0808.2794)
|
||||
|
||||
## 引言
|
||||
|
||||
|
@ -19,9 +20,8 @@ AMP 代表自动混合精度训练。
|
|||
2. apex.amp
|
||||
3. naive amp
|
||||
|
||||
|
||||
| Colossal-AI | 支持张量并行 | 支持流水并行 | fp16 范围 |
|
||||
| ----------- | ----------------------- | ------------------------- | ----------- |
|
||||
| -------------- | ------------ | ------------ | --------------------------------------------------------- |
|
||||
| AMP_TYPE.TORCH | ✅ | ❌ | 在前向和反向传播期间,模型参数、激活和梯度向下转换至 fp16 |
|
||||
| AMP_TYPE.APEX | ❌ | ❌ | 更细粒度,我们可以选择 opt_level O0, O1, O2, O3 |
|
||||
| AMP_TYPE.NAIVE | ✅ | ✅ | 模型参数、前向和反向操作,全都向下转换至 fp16 |
|
||||
|
@ -57,11 +57,14 @@ AMP 代表自动混合精度训练。
|
|||
|
||||
## Colossal-AI 中的 AMP
|
||||
|
||||
我们支持三种 AMP 训练方法,并允许用户在没有改变代码的情况下使用 AMP 进行训练。booster支持amp特性注入,如果您要使用混合精度训练,则在创建booster实例时指定`mixed_precision`参数,我们现已支持torch amp,apex amp, naive amp(现已移植torch amp至booster,apex amp, naive amp仍由`colossalai.initialize`方式启动,如您需使用,请[参考](./mixed_precision_training.md);后续将会拓展`bf16`,`pf8`的混合精度训练.
|
||||
我们支持三种 AMP 训练方法,并允许用户在没有改变代码的情况下使用 AMP 进行训练。booster 支持 amp 特性注入,如果您要使用混合精度训练,则在创建 booster 实例时指定`mixed_precision`参数,我们现已支持 torch amp,apex amp, naive amp(现已移植 torch amp 至 booster,apex amp, naive amp 仍由`colossalai.initialize`方式启动,如您需使用,请[参考](./mixed_precision_training.md);后续将会拓展`bf16`,`pf8`的混合精度训练.
|
||||
|
||||
#### booster 启动方式
|
||||
|
||||
您可以在创建 booster 实例时,指定`mixed_precision="fp16"`即使用 torch amp。
|
||||
|
||||
<!--- doc-test-ignore-start -->
|
||||
|
||||
```python
|
||||
"""
|
||||
初始化映射关系如下:
|
||||
|
@ -74,9 +77,13 @@ AMP 代表自动混合精度训练。
|
|||
from colossalai import Booster
|
||||
booster = Booster(mixed_precision='fp16',...)
|
||||
```
|
||||
|
||||
<!--- doc-test-ignore-end -->
|
||||
|
||||
或者您可以自定义一个`FP16TorchMixedPrecision`对象,如
|
||||
|
||||
<!--- doc-test-ignore-start -->
|
||||
|
||||
```python
|
||||
from colossalai.mixed_precision import FP16TorchMixedPrecision
|
||||
mixed_precision = FP16TorchMixedPrecision(
|
||||
|
@ -86,7 +93,9 @@ mixed_precision = FP16TorchMixedPrecision(
|
|||
growth_interval=2000)
|
||||
booster = Booster(mixed_precision=mixed_precision,...)
|
||||
```
|
||||
|
||||
<!--- doc-test-ignore-end -->
|
||||
|
||||
其他类型的 amp 使用方式也是一样的。
|
||||
|
||||
### Torch AMP 配置
|
||||
|
@ -186,6 +195,7 @@ lr_scheduler = LinearWarmupLR(optimizer, warmup_steps=50, total_steps=NUM_EPOCHS
|
|||
```
|
||||
|
||||
### 步骤 4. 插入 AMP
|
||||
|
||||
创建一个 MixedPrecision 对象(如果需要)及 torchDDPPlugin 对象,调用 `colossalai.boost` 将所有训练组件转为为 FP16 模式.
|
||||
|
||||
```python
|
||||
|
@ -232,4 +242,5 @@ for epoch in range(NUM_EPOCHS):
|
|||
```shell
|
||||
colossalai run --nproc_per_node 1 train.py
|
||||
```
|
||||
|
||||
<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 mixed_precision_training_with_booster.py -->
|
||||
|
|
|
@ -4,8 +4,8 @@ Colossal-AI 是一个集成的大规模深度学习系统,具有高效的并
|
|||
|
||||
## 单 GPU
|
||||
|
||||
Colossal-AI 可以用在只有一个 GPU 的系统上训练深度学习模型,并达到 baseline 的性能。 我们提供了一个 [在CIFAR10数据集上训练ResNet](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet) 的例子,该例子只需要一个 GPU。
|
||||
您可以在 [ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI-Examples) 中获取该例子。详细说明可以在其 `README.md` 中获取。
|
||||
Colossal-AI 可以用在只有一个 GPU 的系统上训练深度学习模型,并达到 baseline 的性能。 我们提供了一个 [在 CIFAR10 数据集上训练 ResNet](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/resnet) 的例子,该例子只需要一个 GPU。
|
||||
您可以在 [ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples) 中获取该例子。详细说明可以在其 `README.md` 中获取。
|
||||
|
||||
## 多 GPU
|
||||
|
||||
|
@ -13,16 +13,20 @@ Colossal-AI 可用于在具有多个 GPU 的分布式系统上训练深度学习
|
|||
|
||||
#### 1. 数据并行
|
||||
|
||||
您可以使用与上述单 GPU 演示相同的 [ResNet例子](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet)。 通过设置 `--nproc_per_node` 为您机器上的 GPU 数量,您就能把数据并行应用在您的例子上了。
|
||||
您可以使用与上述单 GPU 演示相同的 [ResNet 例子](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/resnet)。 通过设置 `--nproc_per_node` 为您机器上的 GPU 数量,您就能把数据并行应用在您的例子上了。
|
||||
|
||||
#### 2. 混合并行
|
||||
|
||||
混合并行包括数据、张量和流水线并行。在 Colossal-AI 中,我们支持不同类型的张量并行(即 1D、2D、2.5D 和 3D)。您可以通过简单地改变 `config.py` 中的配置在不同的张量并行之间切换。您可以参考 [GPT example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/gpt), 更多细节能在它的 `README.md` 中被找到。
|
||||
混合并行包括数据、张量和流水线并行。在 Colossal-AI 中,我们支持不同类型的张量并行(即 1D、2D、2.5D 和 3D)。您可以通过简单地改变 `config.py` 中的配置在不同的张量并行之间切换。您可以参考 [GPT example](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/gpt), 更多细节能在它的 `README.md` 中被找到。
|
||||
|
||||
#### 3. MoE 并行
|
||||
|
||||
我们提供了一个 [WideNet例子](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/widenet) 来验证 MoE 的并行性。 WideNet 使用 Mixture of Experts(MoE)来实现更好的性能。更多的细节可以在我们的教程中获取:[教会您如何把Mixture of Experts整合到模型中](../advanced_tutorials/integrate_mixture_of_experts_into_your_model.md)。
|
||||
<!-- TODO: 在colossalai中实现这个例子 -->
|
||||
|
||||
我们提供了一个 [ViT-MoE 例子](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/moe) 来验证 MoE 的并行性。 WideNet 使用 Mixture of Experts(MoE)来实现更好的性能。更多的细节可以在我们的教程中获取:[教会您如何把 Mixture of Experts 整合到模型中](../advanced_tutorials/integrate_mixture_of_experts_into_your_model.md)。
|
||||
|
||||
#### 4. 序列并行
|
||||
|
||||
序列并行是为了解决NLP任务中的内存效率和序列长度限制问题。 我们在 [ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI-Examples) 中提供了一个 [BERT例子](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/bert/sequene_parallel)。您可以按照 `README.md` 来执行代码。
|
||||
序列并行是为了解决 NLP 任务中的内存效率和序列长度限制问题。 我们在 [ColossalAI-Examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples) 中提供了一个 [Sequence Parallelism 例子](https://github.com/hpcaitech/ColossalAI/tree/main/examples/tutorial/sequence_parallel)。您可以按照 `README.md` 来执行代码。
|
||||
|
||||
<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 run_demo.py -->
|
||||
|
|
Loading…
Reference in New Issue