[doc] put native colossalai plugins first in description section

pull/4757/head
Pengtai Xu 2023-09-20 09:24:10 +08:00
parent e10d9f087e
commit 4d7537ba25
2 changed files with 49 additions and 49 deletions

View File

@ -19,26 +19,17 @@ We currently provide the following plugins:
More plugins are coming soon.
## Choosing Your Plugin
Generally only one plugin is used to train a model. Our recommended use case for each plugin is as follows.
- [Torch DDP Plugin](#torch-ddp-plugin): It is suitable for models with less than 2 billion parameters (e.g. Bert-3m, GPT2-1.5b).
- [Torch FSDP Plugin](#torch-fsdp-plugin) / [Low Level Zero Plugin](#low-level-zero-plugin): It is suitable for models with less than 10 billion parameters (e.g. GPTJ-6b, MegatronLM-8b).
- [Gemini Plugin](#gemini-plugin): It is suitable for models with more than 10 billion parameters (e.g. TuringNLG-17b) and is ideal for scenarios with **high cross-node bandwidth and medium to small-scale clusters (below a thousand cards)** (e.g. Llama2-70b).
- [Hybrid Pararllel Plugin](#hybrid-parallel-plugin): It is suitable for models with more than 60 billion parameters, or special models such as those with exceptionally long sequences, very large vocabularies, and is best suited for scenarios with **low cross-node bandwidth and large-scale clusters (a thousand cards or more)** (e.g. GPT3-175b, Bloom-176b).
## Plugins
### Torch DDP Plugin
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
### Torch FSDP Plugin
> ⚠ This plugin is not available when torch version is lower than 1.12.0.
> ⚠ This plugin does not support save/load sharded model checkpoint now.
> ⚠ This plugin does not support optimizer that use multi params group.
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/fsdp.html).
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
### Low Level Zero Plugin
This plugin implements Zero-1 and Zero-2 (w/wo CPU offload), using `reduce` and `gather` to synchronize gradients and weights.
@ -87,13 +78,22 @@ This plugin implements the combination of various parallel training strategies a
{{ autodoc:colossalai.booster.plugin.HybridParallelPlugin }}
## Choosing Your Plugin
### Torch DDP Plugin
Generally only one plugin is used to train a model. Our recommended use case for each plugin is as follows.
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
- [Torch DDP Plugin](#torch-ddp-plugin): It is suitable for models with less than 2 billion parameters (e.g. Bert-3m, GPT2-1.5b).
- [Torch FSDP Plugin](#torch-fsdp-plugin) / [Low Level Zero Plugin](#low-level-zero-plugin): It is suitable for models with less than 10 billion parameters (e.g. GPTJ-6b, MegatronLM-8b).
- [Gemini Plugin](#gemini-plugin): It is suitable for models with more than 10 billion parameters (e.g. TuringNLG-17b) and is ideal for scenarios with **high cross-node bandwidth and medium to small-scale clusters (below a thousand cards)** (e.g. Llama2-70b).
- [Hybrid Pararllel Plugin](#hybrid-parallel-plugin): It is suitable for models with more than 60 billion parameters, or special models such as those with exceptionally long sequences, very large vocabularies, and is best suited for scenarios with **low cross-node bandwidth and large-scale clusters (a thousand cards or more)** (e.g. GPT3-175b, Bloom-176b).
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
### Torch FSDP Plugin
> ⚠ This plugin is not available when torch version is lower than 1.12.0.
> ⚠ This plugin does not support save/load sharded model checkpoint now.
> ⚠ This plugin does not support optimizer that use multi params group.
More details can be found in [Pytorch Docs](https://pytorch.org/docs/main/fsdp.html).
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
<!-- doc-test-command: echo -->

View File

@ -1,6 +1,7 @@
# Booster 插件
作者: [Hongxin Liu](https://github.com/ver217), [Baizhou Zhang](https://github.com/Fridge003)
作者: [Hongxin Liu](https://github.com/ver217), [Baizhou Zhang](https://github.com/Fridge003), [Pengtai Xu](https://github.com/ppt0011)
**前置教程:**
- [Booster API](./booster_api.md)
@ -19,27 +20,14 @@
更多插件即将推出。
## 插件选择
- [Torch DDP 插件](#torch-ddp-插件): 适用于参数少于 20 亿的模型(例如 Bert-3m、GPT2-1.5b)。
- [Torch FSDP 插件](#torch-fsdp-插件) / [Low Level Zero 插件](#low-level-zero-插件): 适用于参数少于 100 亿的模型(例如 GPTJ-6b、MegatronLM-8b
- [Gemini 插件](#gemini-插件): 适合参数超过 100 亿的模型(例如 TuringNLG-17b且**跨节点带宽高、中小规模集群(千卡以下)**的场景(例如 Llama2-70b
- [Hybrid Pararllel 插件](#hybrid-parallel-插件): 适合参数超过 600 亿的模型、超长序列、超大词表等特殊模型,且**跨节点带宽低、大规模集群(千卡以上)**的场景(例如 GPT3-175b、Bloom-176b
## 插件
### Torch DDP 插件
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
### Torch FSDP 插件
> ⚠ 如果 torch 版本低于 1.12.0,此插件将不可用。
> ⚠ 该插件现在还不支持保存/加载分片的模型 checkpoint。
> ⚠ 该插件现在还不支持使用了multi params group的optimizer。
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/fsdp.html).
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
### Low Level Zero 插件
该插件实现了 Zero-1 和 Zero-2使用/不使用 CPU 卸载),使用`reduce`和`gather`来同步梯度和权重。
@ -87,10 +75,22 @@ Zero-2 不支持局部梯度累积。如果您坚持使用,虽然可以积累
{{ autodoc:colossalai.booster.plugin.HybridParallelPlugin }}
## 插件选择
- [Torch DDP 插件](#torch-ddp-插件): 适用于参数少于 20 亿的模型(例如 Bert-3m、GPT2-1.5b)。
- [Torch FSDP 插件](#torch-fsdp-插件) / [Low Level Zero 插件](#low-level-zero-插件): 适用于参数少于 100 亿的模型(例如 GPTJ-6b、MegatronLM-8b
- [Gemini 插件](#gemini-插件): 适合参数超过 100 亿的模型(例如 TuringNLG-17b且**跨节点带宽高、中小规模集群(千卡以下)**的场景(例如 Llama2-70b
- [Hybrid Pararllel 插件](#hybrid-parallel-插件): 适合参数超过 600 亿的模型、超长序列、超大词表等特殊模型,且**跨节点带宽低、大规模集群(千卡以上)**的场景(例如 GPT3-175b、Bloom-176b
### Torch DDP 插件
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
{{ autodoc:colossalai.booster.plugin.TorchDDPPlugin }}
### Torch FSDP 插件
> ⚠ 如果 torch 版本低于 1.12.0,此插件将不可用。
> ⚠ 该插件现在还不支持保存/加载分片的模型 checkpoint。
> ⚠ 该插件现在还不支持使用了multi params group的optimizer。
更多详细信息,请参阅 [Pytorch 文档](https://pytorch.org/docs/main/fsdp.html).
{{ autodoc:colossalai.booster.plugin.TorchFSDPPlugin }}
<!-- doc-test-command: echo -->