mirror of https://github.com/hpcaitech/ColossalAI
[nfc] fix typo in docs/ (#4972)
parent
dce05da535
commit
fd3567e089
|
@ -26,7 +26,7 @@ Generally only one plugin is used to train a model. Our recommended use case for
|
|||
- [Torch DDP Plugin](#torch-ddp-plugin): It is suitable for models with less than 2 billion parameters (e.g. Bert-3m, GPT2-1.5b).
|
||||
- [Torch FSDP Plugin](#torch-fsdp-plugin) / [Low Level Zero Plugin](#low-level-zero-plugin): It is suitable for models with less than 10 billion parameters (e.g. GPTJ-6b, MegatronLM-8b).
|
||||
- [Gemini Plugin](#gemini-plugin): It is suitable for models with more than 10 billion parameters (e.g. TuringNLG-17b) and is ideal for scenarios with **high cross-node bandwidth and medium to small-scale clusters (below a thousand cards)** (e.g. Llama2-70b).
|
||||
- [Hybrid Pararllel Plugin](#hybrid-parallel-plugin): It is suitable for models with more than 60 billion parameters, or special models such as those with exceptionally long sequences, very large vocabularies, and is best suited for scenarios with **low cross-node bandwidth and large-scale clusters (a thousand cards or more)** (e.g. GPT3-175b, Bloom-176b).
|
||||
- [Hybrid Parallel Plugin](#hybrid-parallel-plugin): It is suitable for models with more than 60 billion parameters, or special models such as those with exceptionally long sequences, very large vocabularies, and is best suited for scenarios with **low cross-node bandwidth and large-scale clusters (a thousand cards or more)** (e.g. GPT3-175b, Bloom-176b).
|
||||
|
||||
## Plugins
|
||||
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
- [Torch FSDP 插件](#torch-fsdp-插件): 它包装了 `torch.distributed.fsdp.FullyShardedDataParallel` 并且可用于使用 Zero-dp 训练模型。
|
||||
- [Low Level Zero 插件](#low-level-zero-插件): 它包装了 `colossalai.zero.low_level.LowLevelZeroOptimizer`,可用于使用 Zero-dp 训练模型。它仅支持 Zero 阶段1和阶段2。
|
||||
- [Gemini 插件](#gemini-插件): 它包装了 [Gemini](../features/zero_with_chunk.md),Gemini 实现了基于Chunk内存管理和异构内存管理的 Zero-3。
|
||||
- [Hybrid Pararllel 插件](#hybrid-parallel-插件): 它为Shardformer,流水线管理器,混合精度运算,TorchDDP以及Zero-1/Zero-2功能提供了一个统一且简洁的接口。使用该插件可以简单高效地实现transformer模型在张量并行,流水线并行以及数据并行(DDP, Zero)间任意组合并行训练策略,同时支持多种训练速度和内存的优化工具。有关这些训练策略和优化工具的具体信息将在下一章中阐述。
|
||||
- [Hybrid Parallel 插件](#hybrid-parallel-插件): 它为Shardformer,流水线管理器,混合精度运算,TorchDDP以及Zero-1/Zero-2功能提供了一个统一且简洁的接口。使用该插件可以简单高效地实现transformer模型在张量并行,流水线并行以及数据并行(DDP, Zero)间任意组合并行训练策略,同时支持多种训练速度和内存的优化工具。有关这些训练策略和优化工具的具体信息将在下一章中阐述。
|
||||
|
||||
更多插件即将推出。
|
||||
|
||||
|
@ -24,7 +24,7 @@
|
|||
- [Torch DDP 插件](#torch-ddp-插件): 适用于参数少于 20 亿的模型(例如 Bert-3m、GPT2-1.5b)。
|
||||
- [Torch FSDP 插件](#torch-fsdp-插件) / [Low Level Zero 插件](#low-level-zero-插件): 适用于参数少于 100 亿的模型(例如 GPTJ-6b、MegatronLM-8b)。
|
||||
- [Gemini 插件](#gemini-插件): 适合参数超过 100 亿的模型(例如 TuringNLG-17b),且**跨节点带宽高、中小规模集群(千卡以下)**的场景(例如 Llama2-70b)。
|
||||
- [Hybrid Pararllel 插件](#hybrid-parallel-插件): 适合参数超过 600 亿的模型、超长序列、超大词表等特殊模型,且**跨节点带宽低、大规模集群(千卡以上)**的场景(例如 GPT3-175b、Bloom-176b)。
|
||||
- [Hybrid Parallel 插件](#hybrid-parallel-插件): 适合参数超过 600 亿的模型、超长序列、超大词表等特殊模型,且**跨节点带宽低、大规模集群(千卡以上)**的场景(例如 GPT3-175b、Bloom-176b)。
|
||||
|
||||
## 插件
|
||||
|
||||
|
|
Loading…
Reference in New Issue