ColossalAI/applications/Chat/benchmarks/README.md

# Benchmarks

## Benchmark OPT with LoRA on dummy prompt data

We provide various OPT models (string in parentheses is the corresponding model name used in this script):

- OPT-125M (125m)
- OPT-350M (350m)
- OPT-700M (700m)
- OPT-1.3B (1.3b)
- OPT-2.7B (2.7b)
- OPT-3.5B (3.5b)
- OPT-5.5B (5.5b)
- OPT-6.7B (6.7b)
- OPT-10B (10b)
- OPT-13B (13b)

We also provide various training strategies:

- ddp: torch DDP
- colossalai_gemini: ColossalAI GeminiDDP with `placement_policy="cuda"`, like zero3
- colossalai_gemini_cpu: ColossalAI GeminiDDP with `placement_policy="cpu"`, like zero3-offload
- colossalai_zero2: ColossalAI zero2
- colossalai_zero2_cpu: ColossalAI zero2-offload
- colossalai_zero1: ColossalAI zero1
- colossalai_zero1_cpu: ColossalAI zero1-offload

We only support `torchrun` to launch now. E.g.

```shell
# run OPT-125M with no lora (lora_rank=0) on single-node single-GPU with min batch size
torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py --model 125m --critic_model 125m --strategy ddp --experience_batch_size 1 --train_batch_size 1 --lora_rank 0
# run Actor (OPT-1.3B) and Critic (OPT-350M) with lora_rank=4 on single-node 4-GPU
torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py --model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4
```
[Coati] first commit (#3283) 2023-03-28 12:25:36 +00:00			`# Benchmarks`

			`## Benchmark OPT with LoRA on dummy prompt data`

			`We provide various OPT models (string in parentheses is the corresponding model name used in this script):`

			`- OPT-125M (125m)`
			`- OPT-350M (350m)`
			`- OPT-700M (700m)`
			`- OPT-1.3B (1.3b)`
			`- OPT-2.7B (2.7b)`
			`- OPT-3.5B (3.5b)`
			`- OPT-5.5B (5.5b)`
			`- OPT-6.7B (6.7b)`
			`- OPT-10B (10b)`
			`- OPT-13B (13b)`

[chat] refactor trainer (#3648) * [chat] ppo trainer remove useless args * [chat] update examples * [chat] update benchmark * [chat] update examples * [chat] fix sft training with wandb * [chat] polish docstr 2023-04-26 10:11:49 +00:00			`We also provide various training strategies:`

			`- ddp: torch DDP`
			- colossalai_gemini: ColossalAI GeminiDDP with `placement_policy="cuda"`, like zero3
			- colossalai_gemini_cpu: ColossalAI GeminiDDP with `placement_policy="cpu"`, like zero3-offload
			`- colossalai_zero2: ColossalAI zero2`
			`- colossalai_zero2_cpu: ColossalAI zero2-offload`
			`- colossalai_zero1: ColossalAI zero1`
			`- colossalai_zero1_cpu: ColossalAI zero1-offload`

[Coati] first commit (#3283) 2023-03-28 12:25:36 +00:00			We only support `torchrun` to launch now. E.g.

			```shell
			`# run OPT-125M with no lora (lora_rank=0) on single-node single-GPU with min batch size`
[chat] refactor trainer (#3648) * [chat] ppo trainer remove useless args * [chat] update examples * [chat] update benchmark * [chat] update examples * [chat] fix sft training with wandb * [chat] polish docstr 2023-04-26 10:11:49 +00:00			`torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py --model 125m --critic_model 125m --strategy ddp --experience_batch_size 1 --train_batch_size 1 --lora_rank 0`
			`# run Actor (OPT-1.3B) and Critic (OPT-350M) with lora_rank=4 on single-node 4-GPU`
			`torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py --model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4`
[Coati] first commit (#3283) 2023-03-28 12:25:36 +00:00			```