mirror of https://github.com/hpcaitech/ColossalAI
[doc] explain suitable use case for each plugin
parent
079bf3cb26
commit
10513f203c
|
@ -1,6 +1,6 @@
|
|||
# Booster Plugins
|
||||
|
||||
Author: [Hongxin Liu](https://github.com/ver217), [Baizhou Zhang](https://github.com/Fridge003)
|
||||
Author: [Hongxin Liu](https://github.com/ver217), [Baizhou Zhang](https://github.com/Fridge003), [Pengtai Xu](https://github.com/ppt0011)
|
||||
|
||||
**Prerequisite:**
|
||||
- [Booster API](./booster_api.md)
|
||||
|
@ -11,16 +11,43 @@ As mentioned in [Booster API](./booster_api.md), we can use booster plugins to c
|
|||
|
||||
We currently provide the following plugins:
|
||||
|
||||
- [Low Level Zero Plugin](#low-level-zero-plugin): It wraps the `colossalai.zero.low_level.LowLevelZeroOptimizer` and can be used to train models with zero-dp. It only supports zero stage-1 and stage-2.
|
||||
- [Gemini Plugin](#gemini-plugin): It wraps the [Gemini](../features/zero_with_chunk.md) which implements Zero-3 with chunk-based and heterogeneous memory management.
|
||||
- [Torch DDP Plugin](#torch-ddp-plugin): It is a wrapper of `torch.nn.parallel.DistributedDataParallel` and can be used to train models with data parallelism.
|
||||
- [Torch FSDP Plugin](#torch-fsdp-plugin): It is a wrapper of `torch.distributed.fsdp.FullyShardedDataParallel` and can be used to train models with zero-dp.
|
||||
- [Low Level Zero Plugin](#low-level-zero-plugin): It wraps the `colossalai.zero.low_level.LowLevelZeroOptimizer` and can be used to train models with zero-dp. It only supports zero stage-1 and stage-2.
|
||||
- [Gemini Plugin](#gemini-plugin): It wraps the [Gemini](../features/zero_with_chunk.md) which implements Zero-3 with chunk-based and heterogeneous memory management.
|
||||
- [Hybrid Pararllel Plugin](#hybrid-parallel-plugin): It provides a tidy interface that integrates the power of Shardformer, pipeline manager, mixied precision training, TorchDDP and Zero stage 1/2 feature. With this plugin, transformer models can be easily trained with any combination of tensor parallel, pipeline parallel and data parallel (DDP/Zero) efficiently, along with various kinds of optimization tools for acceleration and memory saving. Detailed information about supported parallel strategies and optimization tools is explained in the section below.
|
||||
|
||||
More plugins are coming soon.
|
||||
|
||||
## Choosing Your Plugin
|
||||
|
||||
Generally only one plugin is used to train a model. Our recommended use case for each plugin is as follows.
|
||||
|
||||
- [Torch DDP Plugin](#torch-ddp-plugin): It is suitable for models with less than 2 billion parameters.
|
||||
- [Torch FSDP Plugin](#torch-fsdp-plugin) / [Low Level Zero Plugin](#low-level-zero-plugin): It is suitable for models with less than 10 billion parameters.
|
||||
- [Gemini Plugin](#gemini-plugin): it is suitable for models with more than 10 billion parameters and is ideal for scenarios with high cross-node bandwidth and medium to small-scale clusters (below a thousand cards).
|
||||
- [Hybrid Pararllel Plugin](#hybrid-parallel-plugin): It is suitable for models with more than 60 billion parameters, exceptionally long sequences, very large vocabularies, and is best suited for scenarios with low cross-node bandwidth and large-scale clusters (a thousand cards or more).
|
||||
|
||||
## Plugins
|
||||
|
||||
### Torch DDP Plugin
|
||||
|
||||
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
|
||||
|
||||
### Torch FSDP Plugin
|
||||
|
||||
> ⚠ This plugin is not available when torch version is lower than 1.12.0.
|
||||
|
||||
> ⚠ This plugin does not support save/load sharded model checkpoint now.
|
||||
|
||||
> ⚠ This plugin does not support optimizer that use multi params group.
|
||||
|
||||
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/fsdp.html).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
|
||||
|
||||
### Low Level Zero Plugin
|
||||
|
||||
This plugin implements Zero-1 and Zero-2 (w/wo CPU offload), using `reduce` and `gather` to synchronize gradients and weights.
|
||||
|
@ -50,24 +77,6 @@ This plugin implements Zero-3 with chunk-based and heterogeneous memory manageme
|
|||
|
||||
{{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
|
||||
|
||||
### Torch DDP Plugin
|
||||
|
||||
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
|
||||
|
||||
### Torch FSDP Plugin
|
||||
|
||||
> ⚠ This plugin is not available when torch version is lower than 1.12.0.
|
||||
|
||||
> ⚠ This plugin does not support save/load sharded model checkpoint now.
|
||||
|
||||
> ⚠ This plugin does not support optimizer that use multi params group.
|
||||
|
||||
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/fsdp.html).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
|
||||
|
||||
|
||||
### Hybrid Parallel Plugin
|
||||
|
||||
|
@ -87,5 +96,4 @@ This plugin implements the combination of various parallel training strategies a
|
|||
|
||||
{{ autodoc:colossalai.booster.plugin.HybridParallelPlugin }}
|
||||
|
||||
|
||||
<!-- doc-test-command: echo -->
|
||||
|
|
|
@ -11,16 +11,41 @@
|
|||
|
||||
我们现在提供以下插件:
|
||||
|
||||
- [Low Level Zero 插件](#low-level-zero-插件): 它包装了 `colossalai.zero.low_level.LowLevelZeroOptimizer`,可用于使用 Zero-dp 训练模型。它仅支持 Zero 阶段1和阶段2。
|
||||
- [Gemini 插件](#gemini-插件): 它包装了 [Gemini](../features/zero_with_chunk.md),Gemini 实现了基于Chunk内存管理和异构内存管理的 Zero-3。
|
||||
- [Torch DDP 插件](#torch-ddp-插件): 它包装了 `torch.nn.parallel.DistributedDataParallel` 并且可用于使用数据并行训练模型。
|
||||
- [Torch FSDP 插件](#torch-fsdp-插件): 它包装了 `torch.distributed.fsdp.FullyShardedDataParallel` 并且可用于使用 Zero-dp 训练模型。
|
||||
- [Low Level Zero 插件](#low-level-zero-插件): 它包装了 `colossalai.zero.low_level.LowLevelZeroOptimizer`,可用于使用 Zero-dp 训练模型。它仅支持 Zero 阶段1和阶段2。
|
||||
- [Gemini 插件](#gemini-插件): 它包装了 [Gemini](../features/zero_with_chunk.md),Gemini 实现了基于Chunk内存管理和异构内存管理的 Zero-3。
|
||||
- [Hybrid Pararllel 插件](#hybrid-parallel-插件): 它为Shardformer,流水线管理器,混合精度运算,TorchDDP以及Zero-1/Zero-2功能提供了一个统一且简洁的接口。使用该插件可以简单高效地实现transformer模型在张量并行,流水线并行以及数据并行(DDP, Zero)间任意组合并行训练策略,同时支持多种训练速度和内存的优化工具。有关这些训练策略和优化工具的具体信息将在下一章中阐述。
|
||||
|
||||
更多插件即将推出。
|
||||
|
||||
## 插件选择
|
||||
- [Torch DDP 插件](#torch-ddp-插件): 适用于参数少于 20 亿的模型。
|
||||
- [Torch FSDP 插件](#torch-fsdp-插件) / [Low Level Zero 插件](#low-level-zero-插件): 适用于参数少于 100 亿的模型。
|
||||
- [Gemini 插件](#gemini-插件): 适合参数超过 100 亿的模型,且跨节点带宽高、中小规模集群(千卡以下)的场景。
|
||||
- [Hybrid Pararllel 插件](#hybrid-parallel-插件): 适合参数超过 600 亿的模型、超长序列、超大词表等特殊模型,且跨节点带宽低、大规模集群(千卡以上)的场景。
|
||||
|
||||
## 插件
|
||||
|
||||
### Torch DDP 插件
|
||||
|
||||
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
|
||||
|
||||
### Torch FSDP 插件
|
||||
|
||||
> ⚠ 如果 torch 版本低于 1.12.0,此插件将不可用。
|
||||
|
||||
> ⚠ 该插件现在还不支持保存/加载分片的模型 checkpoint。
|
||||
|
||||
> ⚠ 该插件现在还不支持使用了multi params group的optimizer。
|
||||
|
||||
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/fsdp.html).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
|
||||
|
||||
|
||||
### Low Level Zero 插件
|
||||
|
||||
该插件实现了 Zero-1 和 Zero-2(使用/不使用 CPU 卸载),使用`reduce`和`gather`来同步梯度和权重。
|
||||
|
@ -50,26 +75,6 @@ Zero-2 不支持局部梯度累积。如果您坚持使用,虽然可以积累
|
|||
|
||||
{{ autodoc:colossalai.booster.plugin.GeminiPlugin }}
|
||||
|
||||
|
||||
### Torch DDP 插件
|
||||
|
||||
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
|
||||
|
||||
### Torch FSDP 插件
|
||||
|
||||
> ⚠ 如果 torch 版本低于 1.12.0,此插件将不可用。
|
||||
|
||||
> ⚠ 该插件现在还不支持保存/加载分片的模型 checkpoint。
|
||||
|
||||
> ⚠ 该插件现在还不支持使用了multi params group的optimizer。
|
||||
|
||||
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/fsdp.html).
|
||||
|
||||
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
|
||||
|
||||
|
||||
### Hybrid Parallel 插件
|
||||
|
||||
这个插件实现了多种并行训练策略和优化工具的组合。Hybrid Parallel插件支持的功能大致可以被分为以下四个部分:
|
||||
|
|
Loading…
Reference in New Issue