Making large AI models cheaper, faster and more accessible

ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism

History

CH.Li 7aacfad8af fix typo (#2721 )		2 years ago
..
benchmarks	fix typo (#2721 )	2 years ago
chatgpt	[chatgpt] optimize generation kwargs (#2717 )	2 years ago
examples	[chatgpt] optimize generation kwargs (#2717 )	2 years ago
tests	[app] add chatgpt application (#2698 )	2 years ago
.gitignore	[app] add chatgpt application (#2698 )	2 years ago
LICENSE	[app] add chatgpt application (#2698 )	2 years ago
README.md	[doc] add open-source contribution invitation (#2714 )	2 years ago
pytest.ini	[app] add chatgpt application (#2698 )	2 years ago
requirements-test.txt	[app] fix ChatGPT requirements (#2704 )	2 years ago
requirements.txt	[app] fix ChatGPT requirements (#2704 )	2 years ago
setup.py	[app] fix ChatGPT requirements (#2704 )	2 years ago
version.txt	[app] add chatgpt application (#2698 )	2 years ago

README.md

RLHF - Colossal-AI

Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by Colossal-AI. It supports distributed training and offloading, which can fit extremly large models. More details can be found in the blog.

Training process (step 3)

Install

pip install .

Usage

The main entrypoint is Trainer. We only support PPO trainer now. We support many training strategies:

NaiveStrategy: simplest strategy. Train on single GPU.
DDPStrategy: use torch.nn.parallel.DistributedDataParallel. Train on multi GPUs.
ColossalAIStrategy: use Gemini and Zero of ColossalAI. It eliminates model duplication on each GPU and supports offload. It's very useful when training large models on multi GPUs.

Simplest usage:

from chatgpt.trainer import PPOTrainer
from chatgpt.trainer.strategies import ColossalAIStrategy

strategy = ColossalAIStrategy()

with strategy.model_init_context():
  # init your model here
  actor = Actor()
  critic = Critic()

trainer = PPOTrainer(actor = actor, critic= critic, strategy, ...)

trainer.fit(dataset, ...)

For more details, see examples/.

We also support training reward model with true-world data. See examples/train_reward_model.py.

Todo

implement PPO training
implement training reward model
support LoRA
implement PPO-ptx fine-tuning
integrate with Ray
support more RL paradigms, like Implicit Language Q-Learning (ILQL)

Invitation to open-source contribution

Referring to the successful attempts of BLOOM and Stable Diffusion, any and all developers and partners with computing powers, datasets, models are welcome to join and build an ecosystem with Colossal-AI, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!

You may contact us or participate in the following ways:

Posting an issue or submitting a PR on GitHub
Join the Colossal-AI community on Slack, and WeChat to share your ideas.
Check out and fill in the cooperation proposal
Send your proposal to email contact@hpcaitech.com

Thanks so much to all of our amazing contributors!

Quick Preview

Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference

Up to 10.3x growth in model capacity on one GPU
A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)

Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
Keep in a sufficiently high running speed

Citations

@article{Hu2021LoRALA,
    title   = {LoRA: Low-Rank Adaptation of Large Language Models},
    author  = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Weizhu Chen},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2106.09685}
}

@article{ouyang2022training,
  title={Training language models to follow instructions with human feedback},
  author={Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others},
  journal={arXiv preprint arXiv:2203.02155},
  year={2022}
}