ColossalAI/applications/Chat/benchmarks
Wenhao Chen 153b957a1b
[chat] refactor strategy class with booster api (#3987)
* refactor: adapt boost API in base and naive strategies

* fix: initialize plugin after setup_distributed

* fix: fix save_pretrained fn

* refactor: adapt boost API in DDPStrategy

* to: add _post_init check

* to: fix ddp backward, modify ddp dataloader and unwrap

* feat: adapt boost API in ColossalAIStrategy

* fix: call setup_distributed before use get_current_device

* fix: fix save_model and save_optimizer

* test: remove save_sharded_optimizer test

* style: apply formatter

* fix: fix stage check and add comments

* feat: allow dict type arg in strategy.prepare

* to: temporarily remove lr_scheduler for testing

* style: simplify init of ColossalAIStrategy

* fix: fix lr_scheduler in sft and rm

* style: modify comments

* test: add train_prompts tests

* fix: fix inference only case and use in train_prompts

* test: skip failed tests in ci

* style: fix CodeFactor check

* fix: do not use model.to('cpu') with GeminiPlugin

* test: enable colossalai_gemini tests

* test: set CUDA_VISIBLE_DEVICES in ci

* docs: add note
2023-06-25 17:36:21 +08:00
..
ray [chat] add distributed PPO trainer (#3740) 2023-06-07 10:41:16 +08:00
README.md [chat] refactor trainer (#3648) 2023-04-26 18:11:49 +08:00
benchmark_opt_lora_dummy.py [chat] refactor strategy class with booster api (#3987) 2023-06-25 17:36:21 +08:00

README.md

Benchmarks

Benchmark OPT with LoRA on dummy prompt data

We provide various OPT models (string in parentheses is the corresponding model name used in this script):

  • OPT-125M (125m)
  • OPT-350M (350m)
  • OPT-700M (700m)
  • OPT-1.3B (1.3b)
  • OPT-2.7B (2.7b)
  • OPT-3.5B (3.5b)
  • OPT-5.5B (5.5b)
  • OPT-6.7B (6.7b)
  • OPT-10B (10b)
  • OPT-13B (13b)

We also provide various training strategies:

  • ddp: torch DDP
  • colossalai_gemini: ColossalAI GeminiDDP with placement_policy="cuda", like zero3
  • colossalai_gemini_cpu: ColossalAI GeminiDDP with placement_policy="cpu", like zero3-offload
  • colossalai_zero2: ColossalAI zero2
  • colossalai_zero2_cpu: ColossalAI zero2-offload
  • colossalai_zero1: ColossalAI zero1
  • colossalai_zero1_cpu: ColossalAI zero1-offload

We only support torchrun to launch now. E.g.

# run OPT-125M with no lora (lora_rank=0) on single-node single-GPU with min batch size
torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py --model 125m --critic_model 125m --strategy ddp --experience_batch_size 1 --train_batch_size 1 --lora_rank 0
# run Actor (OPT-1.3B) and Critic (OPT-350M) with lora_rank=4 on single-node 4-GPU
torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py --model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4