From 0d482302a15310a3c7e667b42f6c70fd707763d4 Mon Sep 17 00:00:00 2001 From: digger yu Date: Wed, 22 Nov 2023 10:39:01 +0800 Subject: [PATCH] [nfc] fix typo and author name (#5089) --- docs/source/en/features/lazy_init.md | 2 +- docs/source/en/features/shardformer.md | 4 ++-- docs/source/en/features/zero_with_chunk.md | 2 +- docs/source/zh-Hans/features/lazy_init.md | 2 +- docs/source/zh-Hans/features/zero_with_chunk.md | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/en/features/lazy_init.md b/docs/source/en/features/lazy_init.md index 133fd7992..a78af4b30 100644 --- a/docs/source/en/features/lazy_init.md +++ b/docs/source/en/features/lazy_init.md @@ -1,6 +1,6 @@ # Lazy initialization -Author: [Hongxiu Liu](https://github.com/ver217) +Author: [Hongxin Liu](https://github.com/ver217) **Prerequisite:** - [Train with booster](../basics/booster_api.md) diff --git a/docs/source/en/features/shardformer.md b/docs/source/en/features/shardformer.md index a6e32d2c0..bf7b2b3e4 100644 --- a/docs/source/en/features/shardformer.md +++ b/docs/source/en/features/shardformer.md @@ -20,7 +20,7 @@ Author: [Baizhou Zhang](https://github.com/Fridge003), [Bin Jia](https://github. ## Introduction -When training large transformer models such as LLaMa-2 70B or OPT 175B, model parallelism methods that divide a huge model into smaller shards, including tensor parallelism or pipeline parallism, are essential so as to meet the limitation of GPU memory. +When training large transformer models such as LLaMa-2 70B or OPT 175B, model parallelism methods that divide a huge model into smaller shards, including tensor parallelism or pipeline parallelism, are essential so as to meet the limitation of GPU memory. However, manually cutting model and rewriting its forward/backword logic could be difficult for users who are not familiar with distributed training. Meanwhile, the Huggingface transformers library has gradually become users' first choice of model source, and most mainstream large models have been open-sourced in Huggingface transformers model library. @@ -321,7 +321,7 @@ For example, when training LlaMa-2 with tensor parallel size as 2, the attribute 3. Replacing the `forward` methods implemented by original Huggingface Transformers libraries with our customized `forward` methods. -This replacement is essential for pipeline paralellism, where a customiozed function is needed to pass intermediate hidden states between different pipeline stages. +This replacement is essential for pipeline parallelism, where a customized function is needed to pass intermediate hidden states between different pipeline stages. Also, optimization methods such as flash attention or sequence parallel can be injected into the `forward` process through our customized `forward` method. 4. Replacing the whole copy of model parameters and optimizer states with incomplete ones controlled by current device (this is why it's called Shardformer). diff --git a/docs/source/en/features/zero_with_chunk.md b/docs/source/en/features/zero_with_chunk.md index 42305182b..62be86488 100644 --- a/docs/source/en/features/zero_with_chunk.md +++ b/docs/source/en/features/zero_with_chunk.md @@ -1,6 +1,6 @@ # Zero Redundancy Optimizer with chunk-based memory management -Author: [Hongxiu Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY) +Author: [Hongxin Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY) **Prerequisite:** - [Train with booster](../basics/booster_api.md) diff --git a/docs/source/zh-Hans/features/lazy_init.md b/docs/source/zh-Hans/features/lazy_init.md index 80742a56d..cdca51d6f 100644 --- a/docs/source/zh-Hans/features/lazy_init.md +++ b/docs/source/zh-Hans/features/lazy_init.md @@ -1,6 +1,6 @@ # 懒惰初始化 -作者: [Hongxiu Liu](https://github.com/ver217) +作者: [Hongxin Liu](https://github.com/ver217) **前置教程:** - [Train with booster](../basics/booster_api.md) diff --git a/docs/source/zh-Hans/features/zero_with_chunk.md b/docs/source/zh-Hans/features/zero_with_chunk.md index 612906285..c4f21c73c 100644 --- a/docs/source/zh-Hans/features/zero_with_chunk.md +++ b/docs/source/zh-Hans/features/zero_with_chunk.md @@ -1,6 +1,6 @@ # 基于Chunk内存管理的零冗余优化器 (ZeRO) -作者: [Hongxiu Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY) +作者: [Hongxin Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY) **前置教程:**