mirror of https://github.com/hpcaitech/ColossalAI
[nfc] fix typo and author name (#5089)
parent
fd3567e089
commit
0d482302a1
|
@ -1,6 +1,6 @@
|
|||
# Lazy initialization
|
||||
|
||||
Author: [Hongxiu Liu](https://github.com/ver217)
|
||||
Author: [Hongxin Liu](https://github.com/ver217)
|
||||
|
||||
**Prerequisite:**
|
||||
- [Train with booster](../basics/booster_api.md)
|
||||
|
|
|
@ -20,7 +20,7 @@ Author: [Baizhou Zhang](https://github.com/Fridge003), [Bin Jia](https://github.
|
|||
|
||||
## Introduction
|
||||
|
||||
When training large transformer models such as LLaMa-2 70B or OPT 175B, model parallelism methods that divide a huge model into smaller shards, including tensor parallelism or pipeline parallism, are essential so as to meet the limitation of GPU memory.
|
||||
When training large transformer models such as LLaMa-2 70B or OPT 175B, model parallelism methods that divide a huge model into smaller shards, including tensor parallelism or pipeline parallelism, are essential so as to meet the limitation of GPU memory.
|
||||
However, manually cutting model and rewriting its forward/backword logic could be difficult for users who are not familiar with distributed training.
|
||||
Meanwhile, the Huggingface transformers library has gradually become users' first choice of model source, and most mainstream large models have been open-sourced in Huggingface transformers model library.
|
||||
|
||||
|
@ -321,7 +321,7 @@ For example, when training LlaMa-2 with tensor parallel size as 2, the attribute
|
|||
|
||||
3. Replacing the `forward` methods implemented by original Huggingface
|
||||
Transformers libraries with our customized `forward` methods.
|
||||
This replacement is essential for pipeline paralellism, where a customiozed function is needed to pass intermediate hidden states between different pipeline stages.
|
||||
This replacement is essential for pipeline parallelism, where a customized function is needed to pass intermediate hidden states between different pipeline stages.
|
||||
Also, optimization methods such as flash attention or sequence parallel can be injected into the `forward` process through our customized `forward` method.
|
||||
|
||||
4. Replacing the whole copy of model parameters and optimizer states with incomplete ones controlled by current device (this is why it's called Shardformer).
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Zero Redundancy Optimizer with chunk-based memory management
|
||||
|
||||
Author: [Hongxiu Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
||||
Author: [Hongxin Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
||||
|
||||
**Prerequisite:**
|
||||
- [Train with booster](../basics/booster_api.md)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# 懒惰初始化
|
||||
|
||||
作者: [Hongxiu Liu](https://github.com/ver217)
|
||||
作者: [Hongxin Liu](https://github.com/ver217)
|
||||
|
||||
**前置教程:**
|
||||
- [Train with booster](../basics/booster_api.md)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# 基于Chunk内存管理的零冗余优化器 (ZeRO)
|
||||
|
||||
作者: [Hongxiu Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
||||
作者: [Hongxin Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
||||
|
||||
**前置教程:**
|
||||
|
||||
|
|
Loading…
Reference in New Issue