[doc] add tutorial for booster plugins (#3758)

* [doc] add en booster plugins doc

* [doc] add booster plugins doc in sidebar

* [doc] add zh booster plugins doc

* [doc] fix zh booster plugin translation

* [doc] reoganize tutorials order of basic section

* [devops] force sync to test ci
pull/3780/head^2
Hongxin Liu 2023-05-19 12:12:42 +08:00 committed by GitHub
parent 5ce6c9d86f
commit 21e29e2212
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 132 additions and 3 deletions

View File

@ -26,14 +26,15 @@
"collapsed": true,
"items": [
"basics/command_line_tool",
"basics/define_your_config",
"basics/launch_colossalai",
"basics/booster_api",
"basics/booster_plugins",
"basics/define_your_config",
"basics/initialize_features",
"basics/engine_trainer",
"basics/configure_parallelization",
"basics/model_checkpoint",
"basics/colotensor_concept",
"basics/booster_api"
"basics/colotensor_concept"
]
},
{

View File

@ -0,0 +1,64 @@
# Booster Plugins
Author: [Hongxin Liu](https://github.com/ver217)
**Prerequisite:**
- [Booster API](./booster_api.md)
## Introduction
As mentioned in [Booster API](./booster_api.md), we can use booster plugins to customize the parallel training. In this tutorial, we will introduce how to use booster plugins.
We currently provide the following plugins:
- [Low Level Zero Plugin](#low-level-zero-plugin): It wraps the `colossalai.zero.low_level.LowLevelZeroOptimizer` and can be used to train models with zero-dp. It only supports zero stage-1 and stage-2.
- [Gemini Plugin](#gemini-plugin): It wraps the [Gemini](../features/zero_with_chunk.md) which implements Zero-3 with chunk-based and heterogeneous memory management.
- [Torch DDP Plugin](#torch-ddp-plugin): It is a wrapper of `torch.nn.parallel.DistributedDataParallel` and can be used to train models with data parallelism.
- [Torch FSDP Plugin](#torch-fsdp-plugin): It is a wrapper of `torch.distributed.fsdp.FullyShardedDataParallel` and can be used to train models with zero-dp.
More plugins are coming soon.
## Plugins
### Low Level Zero Plugin
This plugin implements Zero-1 and Zero-2 (w/wo CPU offload), using `reduce` and `gather` to synchronize gradients and weights.
Zero-1 can be regarded as a better substitute of Torch DDP, which is more memory efficient and faster. It can be easily used in hybrid parallelism.
Zero-2 does not support local gradient accumulation. Though you can accumulate gradient if you insist, it cannot reduce communication cost. That is to say, it's not a good idea to use Zero-2 with pipeline parallelism.
{{ autodoc:colossalai.booster.plugin.LowLevelZeroPlugin }}
We've tested compatibility on some famous models, following models may not be supported:
- `timm.models.convit_base`
- dlrm and deepfm models in `torchrec`
- `diffusers.VQModel`
- `transformers.AlbertModel`
- `transformers.AlbertForPreTraining`
- `transformers.BertModel`
- `transformers.BertForPreTraining`
- `transformers.GPT2DoubleHeadsModel`
Compatibility problems will be fixed in the future.
### Gemini Plugin
This plugin implements Zero-3 with chunk-based and heterogeneous memory management. It can train large models without much loss in speed. It also does not support local gradient accumulation. More details can be found in [Gemini Doc](../features/zero_with_chunk.md).
{{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
### Torch DDP Plugin
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
### Torch FSDP Plugin
> ⚠ This plugin is not available when torch version is lower than 1.12.0.
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/fsdp.html).
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}

View File

@ -0,0 +1,64 @@
# Booster 插件
作者: [Hongxin Liu](https://github.com/ver217)
**前置教程:**
- [Booster API](./booster_api.md)
## 引言
正如 [Booster API](./booster_api.md) 中提到的,我们可以使用 booster 插件来自定义并行训练。在本教程中,我们将介绍如何使用 booster 插件。
我们现在提供以下插件:
- [Low Level Zero 插件](#low-level-zero-plugin): 它包装了 `colossalai.zero.low_level.LowLevelZeroOptimizer`,可用于使用 Zero-dp 训练模型。它仅支持 Zero 阶段1和阶段2。
- [Gemini 插件](#gemini-plugin): 它包装了 [Gemini](../features/zero_with_chunk.md)Gemini 实现了基于Chunk内存管理和异构内存管理的 Zero-3。
- [Torch DDP 插件](#torch-ddp-plugin): 它包装了 `torch.nn.parallel.DistributedDataParallel` 并且可用于使用数据并行训练模型。
- [Torch FSDP 插件](#torch-fsdp-plugin): 它包装了 `torch.distributed.fsdp.FullyShardedDataParallel` 并且可用于使用 Zero-dp 训练模型。
更多插件即将推出。
## 插件
### Low Level Zero 插件
该插件实现了 Zero-1 和 Zero-2使用/不使用 CPU 卸载),使用`reduce`和`gather`来同步梯度和权重。
Zero-1 可以看作是 Torch DDP 更好的替代品,内存效率更高,速度更快。它可以很容易地用于混合并行。
Zero-2 不支持局部梯度累积。如果您坚持使用,虽然可以积累梯度,但不能降低通信成本。也就是说,同时使用流水线并行和 Zero-2 并不是一个好主意。
{{ autodoc:colossalai.booster.plugin.LowLevelZeroPlugin }}
我们已经测试了一些主流模型的兼容性,可能不支持以下模型:
- `timm.models.convit_base`
- dlrm and deepfm models in `torchrec`
- `diffusers.VQModel`
- `transformers.AlbertModel`
- `transformers.AlbertForPreTraining`
- `transformers.BertModel`
- `transformers.BertForPreTraining`
- `transformers.GPT2DoubleHeadsModel`
兼容性问题将在未来修复。
### Gemini 插件
这个插件实现了基于Chunk内存管理和异构内存管理的 Zero-3。它可以训练大型模型而不会损失太多速度。它也不支持局部梯度累积。更多详细信息请参阅 [Gemini 文档](../features/zero_with_chunk.md).
{{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
### Torch DDP 插件
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
### Torch FSDP 插件
> ⚠ 如果 torch 版本低于 1.12.0,此插件将不可用。
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/fsdp.html).
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}