ColossalAI/applications/ChatGPT
binmakeswell 94f000515b
[doc] add Quick Preview (#2706)
2023-02-14 23:07:30 +08:00
..
benchmarks [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
chatgpt [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
examples [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
tests [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
.gitignore [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
LICENSE [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
README.md [doc] add Quick Preview (#2706) 2023-02-14 23:07:30 +08:00
pytest.ini [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
requirements-test.txt [app] fix ChatGPT requirements (#2704) 2023-02-14 22:48:15 +08:00
requirements.txt [app] fix ChatGPT requirements (#2704) 2023-02-14 22:48:15 +08:00
setup.py [app] fix ChatGPT requirements (#2704) 2023-02-14 22:48:15 +08:00
version.txt [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00

README.md

RLHF - Colossal-AI

Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by Colossal-AI. It supports distributed training and offloading, which can fit extremly large models. More details can be found in the blog.

Training process (step 3)

Install

pip install .

Usage

The main entrypoint is Trainer. We only support PPO trainer now. We support many training strategies:

  • NaiveStrategy: simplest strategy. Train on single GPU.
  • DDPStrategy: use torch.nn.parallel.DistributedDataParallel. Train on multi GPUs.
  • ColossalAIStrategy: use Gemini and Zero of ColossalAI. It eliminates model duplication on each GPU and supports offload. It's very useful when training large models on multi GPUs.

Simplest usage:

from chatgpt.trainer import PPOTrainer
from chatgpt.trainer.strategies import ColossalAIStrategy

strategy = ColossalAIStrategy()

with strategy.model_init_context():
  # init your model here
  actor = Actor()
  critic = Critic()

trainer = PPOTrainer(actor = actor, critic= critic, strategy, ...)

trainer.fit(dataset, ...)

For more details, see examples/.

We also support training reward model with true-world data. See examples/train_reward_model.py.

Todo

  • implement PPO training
  • implement training reward model
  • support LoRA
  • implement PPO-ptx fine-tuning
  • integrate with Ray
  • support more RL paradigms, like Implicit Language Q-Learning (ILQL)

Quick Preview

  • Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference

  • Up to 10.3x growth in model capacity on one GPU
  • A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)

  • Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
  • Keep in a sufficiently high running speed

Citations

@article{Hu2021LoRALA,
    title   = {LoRA: Low-Rank Adaptation of Large Language Models},
    author  = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Weizhu Chen},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2106.09685}
}

@article{ouyang2022training,
  title={Training language models to follow instructions with human feedback},
  author={Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others},
  journal={arXiv preprint arXiv:2203.02155},
  year={2022}
}