mirror of https://github.com/hpcaitech/ColossalAI
[chat] set default zero2 strategy (#3667)
* [chat] set default gemini strategy * [chat] set default zero2 strategy * [chat] set default zero2 strategypull/3668/head
parent
a22407cc02
commit
268b3cd80d
|
@ -69,7 +69,7 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
|
|||
--grad_checkpoint
|
||||
```
|
||||
### Arg List
|
||||
- --strategy: the strategy using for training, choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'], default='naive'
|
||||
- --strategy: the strategy using for training, choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
|
||||
- --model: model type, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom'
|
||||
- --pretrain: pretrain model, type=str, default=None
|
||||
- --max_datasets_size: the max size of dataset, type=int, default=None
|
||||
|
@ -117,7 +117,7 @@ Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
|
|||
<div align=left>We also train the reward model based on LLaMA-7B, which reaches the ACC of 72.06% after 1 epoch, performing almost the same as Anthropic's best RM.
|
||||
|
||||
### Arg List
|
||||
- --strategy: the strategy using for training, choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'], default='naive'
|
||||
- --strategy: the strategy using for training, choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
|
||||
- --model: model type, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom'
|
||||
- --pretrain: pretrain model, type=str, default=None
|
||||
- --model_path: the path of rm model(if continue to train), type=str, default=None
|
||||
|
@ -158,7 +158,7 @@ Prompt dataset: the instruction dataset mentioned in the above figure which incl
|
|||
Pretrain dataset: the pretrain dataset including the instruction and corresponding response, e.g. you can use the [InstructWild Data](https://github.com/XueFuzhao/InstructionWild/tree/main/data) in stage 1 supervised instructs tuning.
|
||||
|
||||
### Arg List
|
||||
- --strategy: the strategy using for training, choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'], default='naive'
|
||||
- --strategy: the strategy using for training, choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
|
||||
- --model: model type of actor, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom'
|
||||
- --pretrain: pretrain model, type=str, default=None
|
||||
- --rm_model: reward model type, type=str, choices=['gpt2', 'bloom', 'opt', 'llama'], default=None
|
||||
|
|
|
@ -208,7 +208,7 @@ if __name__ == '__main__':
|
|||
parser.add_argument('--pretrain_dataset', type=str, default=None, help='path to the pretrained dataset')
|
||||
parser.add_argument('--strategy',
|
||||
choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'],
|
||||
default='naive',
|
||||
default='colossalai_zero2',
|
||||
help='strategy to use')
|
||||
parser.add_argument('--model', default='gpt2', choices=['gpt2', 'bloom', 'opt', 'llama', 'roberta'])
|
||||
parser.add_argument('--pretrain', type=str, default=None)
|
||||
|
|
|
@ -189,7 +189,7 @@ if __name__ == '__main__':
|
|||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--strategy',
|
||||
choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2'],
|
||||
default='naive')
|
||||
default='colossalai_zero2')
|
||||
parser.add_argument('--model', choices=['gpt2', 'bloom', 'opt', 'deberta', 'llama', 'roberta'], default='bloom')
|
||||
parser.add_argument('--pretrain', type=str, default=None)
|
||||
parser.add_argument('--model_path', type=str, default=None)
|
||||
|
|
|
@ -176,7 +176,7 @@ if __name__ == '__main__':
|
|||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--strategy',
|
||||
choices=['naive', 'ddp', 'colossalai_gemini', 'colossalai_zero2', 'colossalai_zero2_cpu'],
|
||||
default='naive')
|
||||
default='colossalai_zero2')
|
||||
parser.add_argument('--model', choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom')
|
||||
parser.add_argument('--pretrain', type=str, default=None)
|
||||
parser.add_argument('--dataset', type=str, default=None)
|
||||
|
|
Loading…
Reference in New Issue