ColossalAI/README.md

# Colossal-AI
<div id="top" align="center">

   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/colossal-ai_logo_vertical.png)](https://www.colossalai.org/)

   Colossal-AI: Making large AI models cheaper, faster, and more accessible

   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> |
   <a href="https://www.colossalai.org/"> Documentation </a> |
   <a href="https://github.com/hpcaitech/ColossalAI/tree/main/examples"> Examples </a> |
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> |
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>

   [![GitHub Repo stars](https://img.shields.io/github/stars/hpcaitech/ColossalAI?style=social)](https://github.com/hpcaitech/ColossalAI/stargazers)
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml)
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack)
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)


   | [English](README.md) | [中文](docs/README-zh-Hans.md) |

</div>

## Latest News
* [2024/01] [Inference Performance Improved by 46%, Open Source Solution Breaks the Length Limit of LLM for Multi-Round Conversations](https://hpc-ai.com/blog/Colossal-AI-SwiftInfer)
* [2024/01] [Construct Refined 13B Private Model With Just $5000 USD, Upgraded Colossal-AI Llama-2 Open Source](https://hpc-ai.com/blog/colossal-llama-2-13b)
* [2023/11] [Enhanced MoE Parallelism, Open-source MoE Model Training Can Be 9 Times More Efficient](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)
* [2023/09] [One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific LLM Solution](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training)
* [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth)
* [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining)
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
* [2023/03] [Intel and Colossal-AI Partner to Deliver Cost-Efficient Open-Source Solution for Protein Folding Structure Prediction](https://www.hpc-ai.tech/blog/intel-habana)
* [2023/03] [AWS and Google Fund Colossal-AI with Startup Cloud Programs](https://www.hpc-ai.tech/blog/aws-and-google-fund-colossal-ai-with-startup-cloud-programs)

## Table of Contents
<ul>
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
 <li><a href="#Features">Features</a> </li>
 <li>
   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI for Real World Applications</a>
   <ul>
     <li><a href="#Colossal-LLaMA-2">Colossal-LLaMA-2: One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific Llm Solution</a></li>
     <li><a href="#ColossalChat">ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline</a></li>
     <li><a href="#AIGC">AIGC: Acceleration of Stable Diffusion</a></li>
     <li><a href="#Biomedicine">Biomedicine: Acceleration of AlphaFold Protein Structure</a></li>
   </ul>
 </li>
 <li>
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a>
   <ul>
     <li><a href="#LLaMA2">LLaMA 1/2</a></li>
     <li><a href="#MoE">MoE</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
     <li><a href="#PaLM">PaLM</a></li>
     <li><a href="#OPT">OPT</a></li>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#Recommendation-System-Models">Recommendation System Models</a></li>
   </ul>
 </li>
 <li>
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a>
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
 <li>
   <a href="#Inference">Inference</a>
   <ul>
     <li><a href="#SwiftInfer">SwiftInfer:Breaks the Length Limit of LLM for Multi-Round Conversations with 46% Acceleration</a></li>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
     <li><a href="#OPT-Serving">OPT-175B Online Serving for Text Generation</a></li>
     <li><a href="#BLOOM-Inference">176B BLOOM</a></li>
   </ul>
 </li>
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#Contributing">Contributing</a></li>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>

## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

## Features

Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
distributed training and inference in a few lines.

- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
  - [Auto-Parallelism](https://arxiv.org/abs/2302.02599)

- Heterogeneous Memory Management
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
  - Parallelism based on the configuration file

<p align="right">(<a href="#top">back to top</a>)</p>

## Colossal-AI in the Real World

### Colossal-LLaMA-2

- 7B: One half-day of training using a few hundred dollars yields similar results to mainstream large models, open-source and commercial-free domain-specific LLM solution.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[blog]](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-7b-base)
[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-7b-base/summary)

- 13B: Construct refined 13B private model with just $5000 USD.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[blog]](https://hpc-ai.com/blog/colossal-llama-2-13b)
[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-13b-base)
[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-13b-base/summary)

|              Model             |  Backbone  | Tokens Consumed |     MMLU (5-shot)    | CMMLU (5-shot)| AGIEval (5-shot) | GAOKAO (0-shot) | CEval (5-shot)  |
| :----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: |
|          Baichuan-7B           |     -      |      1.2T       |    42.32 (42.30)     | 44.53 (44.02) |        38.72     |       36.74     |       42.80     |
|       Baichuan-13B-Base        |     -      |      1.4T       |    50.51 (51.60)     | 55.73 (55.30) |        47.20     |       51.41     |       53.60     |
|       Baichuan2-7B-Base        |     -      |      2.6T       |    46.97 (54.16)     | 57.67 (57.07) |        45.76     |       52.60     |       54.00     |
|       Baichuan2-13B-Base       |     -      |      2.6T       |    54.84 (59.17)     | 62.62 (61.97) |        52.08     |       58.25     |       58.10     |
|           ChatGLM-6B           |     -      |      1.0T       |    39.67 (40.63)     |   41.17 (-)   |        40.10     |       36.53     |       38.90     |
|          ChatGLM2-6B           |     -      |      1.4T       |    44.74 (45.46)     |   49.40 (-)   |        46.36     |       45.49     |       51.70     |
|          InternLM-7B           |     -      |      1.6T       |    46.70 (51.00)     |   52.00 (-)   |        44.77     |       61.64     |       52.80     |
|            Qwen-7B             |     -      |      2.2T       |        54.29 (56.70) | 56.03 (58.80) |        52.47     |       56.42     |       59.60     |
|           Llama-2-7B           |     -      |      2.0T       |    44.47 (45.30)     |   32.97 (-)   |        32.60     |       25.46     |         -       |
| Linly-AI/Chinese-LLaMA-2-7B-hf | Llama-2-7B |      1.0T       |        37.43         |     29.92     |        32.00     |       27.57     |         -       |
| wenge-research/yayi-7b-llama2  | Llama-2-7B |        -        |        38.56         |     31.52     |        30.99     |       25.95     |         -       |
| ziqingyang/chinese-llama-2-7b  | Llama-2-7B |        -        |        33.86         |     34.69     |        34.52     |       25.18     |        34.2     |
| TigerResearch/tigerbot-7b-base | Llama-2-7B |      0.3T       |        43.73         |     42.04     |        37.64     |       30.61     |         -       |
|  LinkSoul/Chinese-Llama-2-7b   | Llama-2-7B |        -        |        48.41         |     38.31     |        38.45     |       27.72     |         -       |
|       FlagAlpha/Atom-7B        | Llama-2-7B |      0.1T       |        49.96         |     41.10     |        39.83     |       33.00     |         -       |
| IDEA-CCNL/Ziya-LLaMA-13B-v1.1  | Llama-13B  |      0.11T      |        50.25         |     40.99     |        40.04     |       30.54     |         -       |
|  **Colossal-LLaMA-2-7b-base**  | Llama-2-7B |   **0.0085T**   |        53.06         |     49.89     |        51.48     |       58.82     |        50.2     |


### ColossalChat

<div align="center">
   <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
   </a>
</div>

[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0)
[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

<p id="ColossalChat-Speed" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p>

- Up to 10 times faster for RLHF PPO Stage3 Training

<p id="ColossalChat_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>

- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference

<p id="ColossalChat-1GPU" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width=450/>
</p>

- Up to 10.3x growth in model capacity on one GPU
- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)

<p id="ColossalChat-LoRA" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width=600/>
</p>

- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
- Keep at a sufficiently high running speed

<p align="right">(<a href="#top">back to top</a>)</p>


### AIGC
Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) and [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion).
<p id="diffusion_train" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20v2.png" width=800/>
</p>

- [Training](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce Stable Diffusion memory consumption by up to 5.6x and hardware cost by up to 46x (from A100 to RTX3060).

<p id="diffusion_demo" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/DreamBooth.png" width=800/>
</p>

- [DreamBooth Fine-tuning](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): Personalize your model using just 3-5 images of the desired subject.

<p id="inference-sd" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20Inference.jpg" width=800/>
</p>

- [Inference](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce inference GPU memory consumption by 2.5x.


<p align="right">(<a href="#top">back to top</a>)</p>

### Biomedicine
Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)

<p id="FastFold" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
</p>

- [FastFold](https://github.com/hpcaitech/FastFold): Accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.

<p id="FastFold-Intel" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/data%20preprocessing%20with%20Intel.jpg" width=600/>
</p>

- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3x inference acceleration and 39% cost reduce.

<p id="xTrimoMultimer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
</p>

- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.


<p align="right">(<a href="#top">back to top</a>)</p>

## Parallel Training Demo
### LLaMA2
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
</p>

- 70 billion parameter LLaMA2 model training accelerated by 195%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama2)
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA1
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
</p>

- 65-billion-parameter large model pretraining accelerated by 38%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
[[blog]](https://www.hpc-ai.tech/blog/large-model-pretraining)

### MoE
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/MOE_training.png" width=800/>
</p>

- Enhanced MoE parallelism, Open-source MoE model training can be 9 times more efficient
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe)
[[blog]](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)

### GPT-3
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
</p>

- Save 50% GPU resources and 10.7% acceleration

### GPT-2
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>

- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism

<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>

- 24x larger model size on the same hardware
- over 3x acceleration
### BERT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>

- 2x faster training, or 50% longer sequence length

### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

### OPT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because of public pre-trained model weights.
- 45% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/opt) [[Online Serving]](https://colossalai.org/docs/advanced_tutorials/opt_service)

Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples) for more details.

### ViT
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
</p>

- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64

### Recommendation System Models
- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), utilize software cache to train larger embedding tables with a smaller GPU memory budget.

<p align="right">(<a href="#top">back to top</a>)</p>

## Single GPU Training Demo

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 120x larger model size on the same hardware (RTX 3080)

### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>


## Inference
<p id="SwiftInfer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/SwiftInfer.jpg" width=800/>
</p>

- [SwiftInfer](https://github.com/hpcaitech/SwiftInfer): Inference performance improved by 46%, open source solution breaks the length limit of LLM for multi-round conversations

<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware

<p id="OPT-Serving" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20serving.png" width=600/>
</p>

- [OPT Serving](https://colossalai.org/docs/advanced_tutorials/opt_service): Try 175-billion-parameter OPT online services

<p id="BLOOM-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20Inference.PNG" width=800/>
</p>

- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): Reduce hardware deployment costs of 176-billion-parameter BLOOM by more than 10 times.

<p align="right">(<a href="#top">back to top</a>)</p>

## Installation

Requirements:
- PyTorch >= 1.11 and PyTorch <= 2.1
- Python >= 3.7
- CUDA >= 11.0
- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
- Linux OS

If you encounter any problem with installation, you may want to raise an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) in this repository.

### Install from PyPI

You can easily install Colossal-AI with the following command. **By default, we do not build PyTorch extensions during installation.**

```bash
pip install colossalai
```

**Note: only Linux is supported for now.**

However, if you want to build the PyTorch extensions during installation, you can set `CUDA_EXT=1`.

```bash
CUDA_EXT=1 pip install colossalai
```

**Otherwise, CUDA kernels will be built during runtime when you actually need them.**

We also keep releasing the nightly version to PyPI every week. This allows you to access the unreleased features and bug fixes in the main branch.
Installation can be made via

```bash
pip install colossalai-nightly
```

### Download From Source

> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problems. :)

```shell
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# install colossalai
pip install .
```

By default, we do not compile CUDA/C++ kernels. ColossalAI will build them during runtime.
If you want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):

```shell
CUDA_EXT=1 pip install .
```

For Users with CUDA 10.2, you can still build ColossalAI from source. However, you need to manually download the cub library and copy it to the corresponding directory.

```bash
# clone the repository
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# download the cub library
wget https://github.com/NVIDIA/cub/archive/refs/tags/1.8.0.zip
unzip 1.8.0.zip
cp -r cub-1.8.0/cub/ colossalai/kernel/cuda_native/csrc/kernels/include/

# install
CUDA_EXT=1 pip install .
```

<p align="right">(<a href="#top">back to top</a>)</p>

## Use Docker

### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Run the following command to build a docker image from Dockerfile provided.

> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.


```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

<p align="right">(<a href="#top">back to top</a>)</p>

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.

## Contributing
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!

You may contact us or participate in the following ways:
1. [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks!
2. Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md)
3. Send your official proposal to email contact@hpcaitech.com

Thanks so much to all of our amazing contributors!

<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=hpcaitech/ColossalAI"  width="800px"/>
</a>


<p align="right">(<a href="#top">back to top</a>)</p>


## CI/CD

We leverage the power of [GitHub Actions](https://github.com/features/actions) to automate our development, release and deployment workflows. Please check out this [documentation](.github/workflows/README.md) on how the automated workflows are operated.


## Cite Us

This project is inspired by some related projects (some by our team and some by other organizations). We would like to credit these amazing projects as listed in the [Reference List](./docs/REFERENCE.md).

To cite this project, you can use the following BibTeX citation.

```
@inproceedings{10.1145/3605573.3605613,
author = {Li, Shenggui and Liu, Hongxin and Bian, Zhengda and Fang, Jiarui and Huang, Haichen and Liu, Yuliang and Wang, Boxiang and You, Yang},
title = {Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
year = {2023},
isbn = {9798400708435},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3605573.3605613},
doi = {10.1145/3605573.3605613},
abstract = {The success of Transformer models has pushed the deep learning model scale to billions of parameters, but the memory limitation of a single GPU has led to an urgent need for training on multi-GPU clusters. However, the best practice for choosing the optimal parallel strategy is still lacking, as it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism and is integrated with heterogeneous training and zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.},
booktitle = {Proceedings of the 52nd International Conference on Parallel Processing},
pages = {766–775},
numpages = {10},
keywords = {datasets, gaze detection, text tagging, neural networks},
location = {Salt Lake City, UT, USA},
series = {ICPP '23}
}
```

Colossal-AI has been accepted as official tutorial by top conferences [NeurIPS](https://nips.cc/), [SC](https://sc22.supercomputing.org/), [AAAI](https://aaai.org/Conferences/AAAI-23/),
[PPoPP](https://ppopp23.sigplan.org/), [CVPR](https://cvpr2023.thecvf.com/), [ISC](https://www.isc-hpc.com/), [NVIDIA GTC](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-S51482/) ,etc.

<p align="right">(<a href="#top">back to top</a>)</p>
-												fixed some typos in the documents, added blog link and paper author information in README

											
										
										
											2021-11-03 08:07:28 +00:00
+								# Colossal-AI
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								<div id="top" align="center">
-												removed tutorial markdown and refreshed rst files for consistency

											
										
										
											2022-01-19 08:06:53 +00:00
-												update ColossalAI logo (#2316)

Co-authored-by: siqi <siqi@siqis-MacBook-Pro.local>
											
										
										
											2023-01-04 07:41:53 +00:00
+								   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/colossal-ai_logo_vertical.png)](https://www.colossalai.org/)
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								   Colossal-AI: Making large AI models cheaper, faster, and more accessible
-												removed tutorial markdown and refreshed rst files for consistency

											
										
										
											2022-01-19 08:06:53 +00:00
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> |
 								   <a href="https://www.colossalai.org/"> Documentation </a> |
-												[doc] update example link (#2520)

* [doc] update example link

* [doc] update example link
											
										
										
											2023-01-29 02:53:57 +00:00
+								   <a href="https://github.com/hpcaitech/ColossalAI/tree/main/examples"> Examples </a> |
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> |
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
-												updated readme and change log (#224)


											
										
										
											2022-02-14 09:22:48 +00:00
-												[doc] add community contribution guide (#3153)

* [doc] update contribution guide

* [doc] update contribution guide

* [doc] add community contribution guide
											
										
										
											2023-03-17 03:07:24 +00:00
+								   [![GitHub Repo stars](https://img.shields.io/github/stars/hpcaitech/ColossalAI?style=social)](https://github.com/hpcaitech/ColossalAI/stargazers)
-												[doc] fixed broken badge (#2623)


											
										
										
											2023-02-07 08:15:17 +00:00
+								   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml)
-												Update workflow files and README.md (#166)


											
										
										
											2022-01-19 12:15:14 +00:00
+								   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
-												[misc] replace codebeat with codefactor on readme (#436)


											
										
										
											2022-03-16 09:43:52 +00:00
+								   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
-												update hf badge link (#410)


											
										
										
											2022-03-14 09:07:01 +00:00
+								   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
-												[doc] update slack link (#4823)


											
										
										
											2023-09-27 09:37:39 +00:00
+								   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack)
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												[refactor] update docs (#3174)

* refactor: README-zh-Hans

* refactor: REFERENCE

* docs: update paths in README
											
										
										
											2023-03-20 02:52:01 +00:00
+								   | [English](README.md) | [中文](docs/README-zh-Hans.md) |
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
-												add logo at homepage, add forum in issue template (#161)


											
										
										
											2022-01-19 06:29:31 +00:00
+								</div>
-												update documentation

											
										
										
											2021-10-29 01:29:20 +00:00
-												[doc] add news (#1901)


											
										
										
											2022-11-11 09:26:49 +00:00
+								## Latest News
-												[doc] SwiftInfer release (#5236)

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
											
										
										
											2024-01-08 01:55:12 +00:00
+								* [2024/01] [Inference Performance Improved by 46%, Open Source Solution Breaks the Length Limit of LLM for Multi-Round Conversations](https://hpc-ai.com/blog/Colossal-AI-SwiftInfer)
-												[doc] add Colossal-LLaMA-2-13B (#5234)

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B
											
										
										
											2024-01-07 12:53:12 +00:00
+								* [2024/01] [Construct Refined 13B Private Model With Just $5000 USD, Upgraded Colossal-AI Llama-2 Open Source](https://hpc-ai.com/blog/colossal-llama-2-13b)
-												[doc] add moe news (#5128)

* [doc] add moe news

* [doc] add moe news

* [doc] add moe news
											
										
										
											2023-11-28 09:44:06 +00:00
+								* [2023/11] [Enhanced MoE Parallelism, Open-source MoE Model Training Can Be 9 Times More Efficient](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)
 								* [2023/09] [One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific LLM Solution](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
-												[DOC] hotfix/llama2news (#4595)

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news
											
										
										
											2023-09-04 03:50:27 +00:00
+								* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training)
-												[doc] add Series A Funding and NeurIPS news (#4377)

* [doc] add Series A Funding and NeurIPS news

* [kernal] fix mha kernal

* [CI] skip moe

* [CI] fix requirements
											
										
										
											2023-08-04 09:42:07 +00:00
+								* [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth)
-												[example] add llama pretraining (#4257)


											
										
										
											2023-07-17 13:07:44 +00:00
+								* [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining)
-												[doc] add ColossalChat news (#3304)

* [doc] add ColossalChat news

* [doc] add ColossalChat news
											
										
										
											2023-03-29 01:27:55 +00:00
+								* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
-												[doc] add Intel cooperation news (#3333)

* [doc] add Intel cooperation news

* [doc] add Intel cooperation news
											
										
										
											2023-03-30 03:45:01 +00:00
+								* [2023/03] [Intel and Colossal-AI Partner to Deliver Cost-Efficient Open-Source Solution for Protein Folding Structure Prediction](https://www.hpc-ai.tech/blog/intel-habana)
-												[doc] update news (#2983)

* [doc] update news

* [doc] update news
											
										
										
											2023-03-03 02:41:58 +00:00
+								* [2023/03] [AWS and Google Fund Colossal-AI with Startup Cloud Programs](https://www.hpc-ai.tech/blog/aws-and-google-fund-colossal-ai-with-startup-cloud-programs)
-												[doc] add news (#1901)


											
										
										
											2022-11-11 09:26:49 +00:00
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								## Table of Contents
 								<ul>
-												add video (#732)


											
										
										
											2022-04-12 05:41:56 +00:00
+								 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								 <li><a href="#Features">Features</a> </li>
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								 <li>
 								   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI for Real World Applications</a>
 								   <ul>
-												[doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news
											
										
										
											2023-09-25 02:44:15 +00:00
+								     <li><a href="#Colossal-LLaMA-2">Colossal-LLaMA-2: One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific Llm Solution</a></li>
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								     <li><a href="#ColossalChat">ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline</a></li>
 								     <li><a href="#AIGC">AIGC: Acceleration of Stable Diffusion</a></li>
 								     <li><a href="#Biomedicine">Biomedicine: Acceleration of AlphaFold Protein Structure</a></li>
 								   </ul>
 								 </li>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								 <li>
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								   <a href="#Parallel-Training-Demo">Parallel Training Demo</a>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								   <ul>
-												[DOC] hotfix/llama2news (#4595)

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news
											
										
										
											2023-09-04 03:50:27 +00:00
+								     <li><a href="#LLaMA2">LLaMA 1/2</a></li>
-												[doc] add moe news (#5128)

* [doc] add moe news

* [doc] add moe news

* [doc] add moe news
											
										
										
											2023-11-28 09:44:06 +00:00
+								     <li><a href="#MoE">MoE</a></li>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								     <li><a href="#GPT-3">GPT-3</a></li>
 								     <li><a href="#GPT-2">GPT-2</a></li>
 								     <li><a href="#BERT">BERT</a></li>
-												add PaLM link (#704) (#705)


											
										
										
											2022-04-08 10:42:12 +00:00
+								     <li><a href="#PaLM">PaLM</a></li>
-												[NFC] add OPT (#1345)


											
										
										
											2022-07-20 07:02:07 +00:00
+								     <li><a href="#OPT">OPT</a></li>
-												[doc] add feature diffusion v2, bloom, auto-parallel (#2282)


											
										
										
											2023-01-03 09:35:07 +00:00
+								     <li><a href="#ViT">ViT</a></li>
-												[doc] update recommendation system catalogue (#1732)


											
										
										
											2022-10-18 16:25:56 +00:00
+								     <li><a href="#Recommendation-System-Models">Recommendation System Models</a></li>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								   </ul>
 								 </li>
-												update results on a single GPU, highlight quick view (#981)


											
										
										
											2022-05-16 13:14:35 +00:00
+								 <li>
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a>
-												update results on a single GPU, highlight quick view (#981)


											
										
										
											2022-05-16 13:14:35 +00:00
+								   <ul>
 								     <li><a href="#GPT-2-Single">GPT-2</a></li>
 								     <li><a href="#PaLM-Single">PaLM</a></li>
 								   </ul>
 								 </li>
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								 <li>
-												[doc] SwiftInfer release (#5236)

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
											
										
										
											2024-01-08 01:55:12 +00:00
+								   <a href="#Inference">Inference</a>
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								   <ul>
-												[doc] SwiftInfer release (#5236)

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
											
										
										
											2024-01-08 01:55:12 +00:00
+								     <li><a href="#SwiftInfer">SwiftInfer:Breaks the Length Limit of LLM for Multi-Round Conversations with 46% Acceleration</a></li>
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								     <li><a href="#GPT-3-Inference">GPT-3</a></li>
-												[NFC] add OPT serving (#1581)


											
										
										
											2022-09-09 08:56:45 +00:00
+								     <li><a href="#OPT-Serving">OPT-175B Online Serving for Text Generation</a></li>
-												[doc] fix typo of BLOOM (#2643)

* [doc] fix typo of BLOOM
											
										
										
											2023-02-08 09:28:29 +00:00
+								     <li><a href="#BLOOM-Inference">176B BLOOM</a></li>
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								   </ul>
 								 </li>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								 <li>
 								   <a href="#Installation">Installation</a>
 								   <ul>
 								     <li><a href="#PyPI">PyPI</a></li>
 								     <li><a href="#Install-From-Source">Install From Source</a></li>
 								   </ul>
 								 </li>
 								 <li><a href="#Use-Docker">Use Docker</a></li>
 								 <li><a href="#Community">Community</a></li>
-												[doc] fix typo (#3222)

* [doc] fix typo

* [doc] fix typo
											
										
										
											2023-03-24 05:33:35 +00:00
+								 <li><a href="#Contributing">Contributing</a></li>
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								 <li><a href="#Cite-Us">Cite Us</a></li>
 								</ul>
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												add video (#732)


											
										
										
											2022-04-12 05:41:56 +00:00
+								## Why Colossal-AI
 								<div align="center">
 								   <a href="https://youtu.be/KnXSfjqkKN0">
 								   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
 								   </a>
-												Update README.md
											
										
										
											2022-07-30 14:11:07 +00:00
+								   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
-												add video (#732)


											
										
										
											2022-04-12 05:41:56 +00:00
+								</div>
 								<p align="right">(<a href="#top">back to top</a>)</p>
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
+								## Features
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
-												Update README.md (#514)


											
										
										
											2022-03-25 04:12:05 +00:00
+								distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								distributed training and inference in a few lines.
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												[readme] polish readme (#764)

* [readme] polish readme

* centering image
											
										
										
											2022-04-14 09:34:08 +00:00
+								- Parallelism strategies
 								  - Data Parallelism
 								  - Pipeline Parallelism
-												[readme] sync CN readme (#766)


											
										
										
											2022-04-14 13:04:51 +00:00
+								  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
 								  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
-												[NFC] fix paper link

											
										
										
											2022-05-21 10:31:11 +00:00
+								  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
-												[doc] update auto parallel paper link (#2686)

* [doc] update auto parallel paper link

* [doc] update auto parallel paper link
											
										
										
											2023-02-13 15:05:29 +00:00
+								  - [Auto-Parallelism](https://arxiv.org/abs/2302.02599)
-												[readme] polish readme (#764)

* [readme] polish readme

* centering image
											
										
										
											2022-04-14 09:34:08 +00:00
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								- Heterogeneous Memory Management
-												[readme] polish readme (#764)

* [readme] polish readme

* centering image
											
										
										
											2022-04-14 09:34:08 +00:00
+								  - [PatrickStar](https://arxiv.org/abs/2108.05818)
 								- Friendly Usage
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								  - Parallelism based on the configuration file
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
 								## Colossal-AI in the Real World
-												[doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news
											
										
										
											2023-09-25 02:44:15 +00:00
+								### Colossal-LLaMA-2
-												[doc] add Colossal-LLaMA-2-13B (#5234)

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B
											
										
										
											2024-01-07 12:53:12 +00:00
+								- 7B: One half-day of training using a few hundred dollars yields similar results to mainstream large models, open-source and commercial-free domain-specific LLM solution.
-												[doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news
											
										
										
											2023-09-25 02:44:15 +00:00
+								[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
 								[[blog]](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
-												Update main README.md

add modelscope model link

											
										
										
											2023-10-10 07:13:09 +00:00
+								[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-7b-base)
 								[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-7b-base/summary)
-												[doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news
											
										
										
											2023-09-25 02:44:15 +00:00
-												[doc] add Colossal-LLaMA-2-13B (#5234)

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B
											
										
										
											2024-01-07 12:53:12 +00:00
+								- 13B: Construct refined 13B private model with just $5000 USD.
 								[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
 								[[blog]](https://hpc-ai.com/blog/colossal-llama-2-13b)
 								[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-13b-base)
 								[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-13b-base/summary)
-												[doc] Make leaderboard format more uniform and good-looking (#5231)

* Make leaderboard format more unifeid and good-looking

* Update README.md

* Update README.md
											
										
										
											2024-01-06 09:12:29 +00:00
+								|              Model             |  Backbone  | Tokens Consumed |     MMLU (5-shot)    | CMMLU (5-shot)| AGIEval (5-shot) | GAOKAO (0-shot) | CEval (5-shot)  |
 								| :----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: |
 								|          Baichuan-7B           |     -      |      1.2T       |    42.32 (42.30)     | 44.53 (44.02) |        38.72     |       36.74     |       42.80     |
 								|       Baichuan-13B-Base        |     -      |      1.4T       |    50.51 (51.60)     | 55.73 (55.30) |        47.20     |       51.41     |       53.60     |
 								|       Baichuan2-7B-Base        |     -      |      2.6T       |    46.97 (54.16)     | 57.67 (57.07) |        45.76     |       52.60     |       54.00     |
 								|       Baichuan2-13B-Base       |     -      |      2.6T       |    54.84 (59.17)     | 62.62 (61.97) |        52.08     |       58.25     |       58.10     |
 								|           ChatGLM-6B           |     -      |      1.0T       |    39.67 (40.63)     |   41.17 (-)   |        40.10     |       36.53     |       38.90     |
 								|          ChatGLM2-6B           |     -      |      1.4T       |    44.74 (45.46)     |   49.40 (-)   |        46.36     |       45.49     |       51.70     |
 								|          InternLM-7B           |     -      |      1.6T       |    46.70 (51.00)     |   52.00 (-)   |        44.77     |       61.64     |       52.80     |
 								|            Qwen-7B             |     -      |      2.2T       |        54.29 (56.70) | 56.03 (58.80) |        52.47     |       56.42     |       59.60     |
 								|           Llama-2-7B           |     -      |      2.0T       |    44.47 (45.30)     |   32.97 (-)   |        32.60     |       25.46     |         -       |
 								| Linly-AI/Chinese-LLaMA-2-7B-hf | Llama-2-7B |      1.0T       |        37.43         |     29.92     |        32.00     |       27.57     |         -       |
 								| wenge-research/yayi-7b-llama2  | Llama-2-7B |        -        |        38.56         |     31.52     |        30.99     |       25.95     |         -       |
 								| ziqingyang/chinese-llama-2-7b  | Llama-2-7B |        -        |        33.86         |     34.69     |        34.52     |       25.18     |        34.2     |
 								| TigerResearch/tigerbot-7b-base | Llama-2-7B |      0.3T       |        43.73         |     42.04     |        37.64     |       30.61     |         -       |
 								|  LinkSoul/Chinese-Llama-2-7b   | Llama-2-7B |        -        |        48.41         |     38.31     |        38.45     |       27.72     |         -       |
 								|       FlagAlpha/Atom-7B        | Llama-2-7B |      0.1T       |        49.96         |     41.10     |        39.83     |       33.00     |         -       |
 								| IDEA-CCNL/Ziya-LLaMA-13B-v1.1  | Llama-13B  |      0.11T      |        50.25         |     40.99     |        40.04     |       30.54     |         -       |
 								|  **Colossal-LLaMA-2-7b-base**  | Llama-2-7B |   **0.0085T**   |        53.06         |     49.89     |        51.48     |       58.82     |        50.2     |
-												[doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news
											
										
										
											2023-09-25 02:44:15 +00:00
-												[doc] add Colossal-LLaMA-2-13B (#5234)

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B
											
										
										
											2024-01-07 12:53:12 +00:00
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								### ColossalChat
 								<div align="center">
-												[chat] add performance and tutorial (#3786)


											
										
										
											2023-05-19 10:03:56 +00:00
+								   <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
 								   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								   </a>
 								</div>
-												[format] applied code formatting on changed files in pull request 3786 (#3787)

Co-authored-by: github-actions <github-actions@github.com>
											
										
										
											2023-05-22 06:42:09 +00:00
+								[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline.
 								[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
 								[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
-												[chat] add performance and tutorial (#3786)


											
										
										
											2023-05-19 10:03:56 +00:00
+								[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0)
 								[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
 								<p id="ColossalChat-Speed" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
 								</p>
 								- Up to 10 times faster for RLHF PPO Stage3 Training
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
 								<p id="ColossalChat_scaling" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
 								</p>
 								- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference
 								<p id="ColossalChat-1GPU" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width=450/>
 								</p>
 								- Up to 10.3x growth in model capacity on one GPU
 								- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)
 								<p id="ColossalChat-LoRA" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width=600/>
 								</p>
 								- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
 								- Keep at a sufficiently high running speed
 								<p align="right">(<a href="#top">back to top</a>)</p>
 								### AIGC
 								Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) and [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion).
 								<p id="diffusion_train" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20v2.png" width=800/>
 								</p>
 								- [Training](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce Stable Diffusion memory consumption by up to 5.6x and hardware cost by up to 46x (from A100 to RTX3060).
 								<p id="diffusion_demo" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/DreamBooth.png" width=800/>
 								</p>
 								- [DreamBooth Fine-tuning](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): Personalize your model using just 3-5 images of the desired subject.
-												[doc] SwiftInfer release (#5236)

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
											
										
										
											2024-01-08 01:55:12 +00:00
+								<p id="inference-sd" align="center">
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20Inference.jpg" width=800/>
 								</p>
 								- [Inference](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce inference GPU memory consumption by 2.5x.
 								<p align="right">(<a href="#top">back to top</a>)</p>
 								### Biomedicine
 								Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)
 								<p id="FastFold" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
 								</p>
 								- [FastFold](https://github.com/hpcaitech/FastFold): Accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.
 								<p id="FastFold-Intel" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/data%20preprocessing%20with%20Intel.jpg" width=600/>
 								</p>
 								- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3x inference acceleration and 39% cost reduce.
 								<p id="xTrimoMultimer" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
 								</p>
 								- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								## Parallel Training Demo
-												[DOC] hotfix/llama2news (#4595)

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news
											
										
										
											2023-09-04 03:50:27 +00:00
+								### LLaMA2
 								<p align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
 								</p>
 								- 70 billion parameter LLaMA2 model training accelerated by 195%
-												[doc] fix llama2 code link (#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link
											
										
										
											2023-09-14 15:19:25 +00:00
+								[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama2)
-												[DOC] hotfix/llama2news (#4595)

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news
											
										
										
											2023-09-04 03:50:27 +00:00
+								[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												[DOC] hotfix/llama2news (#4595)

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news
											
										
										
											2023-09-04 03:50:27 +00:00
+								### LLaMA1
-												[example] add llama pretraining (#4257)


											
										
										
											2023-07-17 13:07:44 +00:00
+								<p align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
 								</p>
 								- 65-billion-parameter large model pretraining accelerated by 38%
 								[[code]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
 								[[blog]](https://www.hpc-ai.tech/blog/large-model-pretraining)
-												[doc] add moe news (#5128)

* [doc] add moe news

* [doc] add moe news

* [doc] add moe news
											
										
										
											2023-11-28 09:44:06 +00:00
+								### MoE
 								<p align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/MOE_training.png" width=800/>
 								</p>
 								- Enhanced MoE parallelism, Open-source MoE model training can be 9 times more efficient
 								[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe)
 								[[blog]](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)
-												update experimental visualization (#253)


											
										
										
											2022-02-28 08:03:13 +00:00
+								### GPT-3
-												[readme] polish readme (#764)

* [readme] polish readme

* centering image
											
										
										
											2022-04-14 09:34:08 +00:00
+								<p align="center">
-												update GPT-3 visualisation

											
										
										
											2022-07-12 07:47:00 +00:00
+								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
-												[readme] polish readme (#764)

* [readme] polish readme

* centering image
											
										
										
											2022-04-14 09:34:08 +00:00
+								</p>
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								- Save 50% GPU resources and 10.7% acceleration
-												update experimental visualization (#253)


											
										
										
											2022-02-28 08:03:13 +00:00
 								### GPT-2
-												Fix/format (#366)


											
										
										
											2022-03-10 05:32:56 +00:00
+								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
-												update experimental visualization (#253)


											
										
										
											2022-02-28 08:03:13 +00:00
-												Update README.md (#514)


											
										
										
											2022-03-25 04:12:05 +00:00
+								- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
-												update experimental visualization (#253)


											
										
										
											2022-02-28 08:03:13 +00:00
-												update GPT-2 experiment result (#666)


											
										
										
											2022-04-04 05:47:43 +00:00
+								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
-												Update Experiment result about Colossal-AI with ZeRO (#479)

* [readme] add experimental visualisation regarding ColossalAI with ZeRO (#476)

* Hotfix/readme (#478)

* add experimental visualisation regarding ColossalAI with ZeRO

* adjust newly-added figure size
											
										
										
											2022-03-21 08:34:07 +00:00
-												update GPT-2 experiment result (#666)


											
										
										
											2022-04-04 05:47:43 +00:00
+								- 24x larger model size on the same hardware
 								- over 3x acceleration
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
+								### BERT
-												Fix/format (#366)


											
										
										
											2022-03-10 05:32:56 +00:00
+								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												add community group and update issue template(#271)

											
										
										
											2022-02-28 09:07:14 +00:00
+								- 2x faster training, or 50% longer sequence length
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												add PaLM link (#704)

* add PaLM link
											
										
										
											2022-04-08 10:26:59 +00:00
+								### PaLM
 								- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).
-												[NFC] add OPT (#1345)


											
										
										
											2022-07-20 07:02:07 +00:00
+								### OPT
-												update OPT experiment result for 8 GPUs (#1503)


											
										
										
											2022-08-26 07:09:13 +00:00
+								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>
-												[NFC] add OPT (#1345)


											
										
										
											2022-07-20 07:02:07 +00:00
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because of public pre-trained model weights.
-												[doc] update example and OPT serving link (#2769)

* [doc] update OPT serving link

* [doc] update example and OPT serving link

* [doc] update example and OPT serving link
											
										
										
											2023-02-16 12:07:25 +00:00
+								- 45% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/opt) [[Online Serving]](https://colossalai.org/docs/advanced_tutorials/opt_service)
-												[NFC] add OPT (#1345)


											
										
										
											2022-07-20 07:02:07 +00:00
-												[doc] update example and OPT serving link (#2769)

* [doc] update OPT serving link

* [doc] update example and OPT serving link

* [doc] update example and OPT serving link
											
										
										
											2023-02-16 12:07:25 +00:00
+								Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples) for more details.
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												[doc] add feature diffusion v2, bloom, auto-parallel (#2282)


											
										
										
											2023-01-03 09:35:07 +00:00
+								### ViT
 								<p align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
 								</p>
 								- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
-												[embeddings] add doc in readme (#1711)


											
										
										
											2022-10-16 13:57:50 +00:00
+								### Recommendation System Models
-												[doc] update recommedation system urls (#1725)


											
										
										
											2022-10-18 05:38:41 +00:00
+								- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), utilize software cache to train larger embedding tables with a smaller GPU memory budget.
-												[embeddings] add doc in readme (#1711)


											
										
										
											2022-10-16 13:57:50 +00:00
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
-												add Chinese README

											
										
										
											2022-02-18 08:28:37 +00:00
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
+								## Single GPU Training Demo
-												update results on a single GPU, highlight quick view (#981)


											
										
										
											2022-05-16 13:14:35 +00:00
 								### GPT-2
 								<p id="GPT-2-Single" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
 								</p>
 								- 20x larger model size on the same hardware
-												update nvme on readme (#1397)


											
										
										
											2022-08-02 03:39:37 +00:00
+								<p id="GPT-2-NVME" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
 								</p>
 								- 120x larger model size on the same hardware (RTX 3080)
-												update results on a single GPU, highlight quick view (#981)


											
										
										
											2022-05-16 13:14:35 +00:00
+								### PaLM
 								<p id="PaLM-Single" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
 								</p>
 								- 34x larger model size on the same hardware
 								<p align="right">(<a href="#top">back to top</a>)</p>
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
-												[doc] SwiftInfer release (#5236)

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
											
										
										
											2024-01-08 01:55:12 +00:00
+								## Inference
 								<p id="SwiftInfer" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/SwiftInfer.jpg" width=800/>
 								</p>
 								- [SwiftInfer](https://github.com/hpcaitech/SwiftInfer): Inference performance improved by 46%, open source solution breaks the length limit of LLM for multi-round conversations
-												[NFC] add inference (#1044)


											
										
										
											2022-05-30 15:06:49 +00:00
 								<p id="GPT-3-Inference" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
 								</p>
 								- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware
-												[doc] add Intel cooperation for biomedicine (#3108)

* [doc] add Intel cooperation for biomedicine
											
										
										
											2023-03-11 07:21:45 +00:00
+								<p id="OPT-Serving" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20serving.png" width=600/>
 								</p>
-												[doc] update example and OPT serving link (#2769)

* [doc] update OPT serving link

* [doc] update example and OPT serving link

* [doc] update example and OPT serving link
											
										
										
											2023-02-16 12:07:25 +00:00
+								- [OPT Serving](https://colossalai.org/docs/advanced_tutorials/opt_service): Try 175-billion-parameter OPT online services
-												[NFC] add OPT serving (#1581)


											
										
										
											2022-09-09 08:56:45 +00:00
-												[doc] add feature diffusion v2, bloom, auto-parallel (#2282)


											
										
										
											2023-01-03 09:35:07 +00:00
+								<p id="BLOOM-Inference" align="center">
 								<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20Inference.PNG" width=800/>
 								</p>
-												[doc] fix typo of BLOOM (#2643)

* [doc] fix typo of BLOOM
											
										
										
											2023-02-08 09:28:29 +00:00
+								- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): Reduce hardware deployment costs of 176-billion-parameter BLOOM by more than 10 times.
-												[doc] add feature diffusion v2, bloom, auto-parallel (#2282)


											
										
										
											2023-01-03 09:35:07 +00:00
-												[doc] update readme with the new xTrimoMultimer project (#1477)

* update xTrimoMultimer project

* update xTrimoMultimer project

* latest update of xTrimoMultimer project info
											
										
										
											2022-08-22 12:53:14 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
+								## Installation
-												[doc] specified operating system requirement (#3019)

* [doc] specified operating system requirement

* polish code
											
										
										
											2023-03-07 10:04:10 +00:00
 								Requirements:
-												[doc] update pytorch version in documents. (#5177)

* fix

aaa

fix

fix

fix

* fix

* fix

* test ci

* fix ci

fix

* update pytorch version in documents
											
										
										
											2023-12-15 10:16:48 +00:00
+								- PyTorch >= 1.11 and PyTorch <= 2.1
-												[doc] specified operating system requirement (#3019)

* [doc] specified operating system requirement

* polish code
											
										
										
											2023-03-07 10:04:10 +00:00
+								- Python >= 3.7
 								- CUDA >= 11.0
-												[doc] add requirement and highlight application (#3516)

* [doc] add requirement and highlight application

* [doc] link example and application
											
										
										
											2023-04-10 09:37:16 +00:00
+								- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
 								- Linux OS
-												[doc] add Intel cooperation for biomedicine (#3108)

* [doc] add Intel cooperation for biomedicine
											
										
										
											2023-03-11 07:21:45 +00:00
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								If you encounter any problem with installation, you may want to raise an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) in this repository.
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
-												[doc] updated readme regarding pypi installation (#2406)


											
										
										
											2023-01-09 09:08:55 +00:00
+								### Install from PyPI
-												Typo (#2826)


											
										
										
											2023-02-20 02:36:23 +00:00
+								You can easily install Colossal-AI with the following command. **By default, we do not build PyTorch extensions during installation.**
-												[doc] updated readme regarding pypi installation (#2406)


											
										
										
											2023-01-09 09:08:55 +00:00
 								```bash
 								pip install colossalai
 								```
-												[doc] specified operating system requirement (#3019)

* [doc] specified operating system requirement

* polish code
											
										
										
											2023-03-07 10:04:10 +00:00
+								**Note: only Linux is supported for now.**
-												[doc] updated readme regarding pypi installation (#2406)


											
										
										
											2023-01-09 09:08:55 +00:00
+								However, if you want to build the PyTorch extensions during installation, you can set `CUDA_EXT=1`.
 								```bash
 								CUDA_EXT=1 pip install colossalai
 								```
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								**Otherwise, CUDA kernels will be built during runtime when you actually need them.**
-												[doc] updated readme regarding pypi installation (#2406)


											
										
										
											2023-01-09 09:08:55 +00:00
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								We also keep releasing the nightly version to PyPI every week. This allows you to access the unreleased features and bug fixes in the main branch.
-												[doc] updated readme regarding pypi installation (#2406)


											
										
										
											2023-01-09 09:08:55 +00:00
+								Installation can be made via
 								```bash
 								pip install colossalai-nightly
 								```
-												update results on a single GPU, highlight quick view (#981)


											
										
										
											2022-05-16 13:14:35 +00:00
+								### Download From Source
-												update setup and workflow (#222)


											
										
										
											2022-02-14 09:09:30 +00:00
-												Improve grammar and punctuation (#3398)

Minor changes to improve grammar and punctuation.
											
										
										
											2023-04-02 14:00:57 +00:00
+								> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problems. :)
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
 								```shell
-												update examples and sphnix docs for the new api (#63)


											
										
										
											2021-12-13 14:07:01 +00:00
+								git clone https://github.com/hpcaitech/ColossalAI.git
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
+								cd ColossalAI
-												update results on a single GPU, highlight quick view (#981)


											
										
										
											2022-05-16 13:14:35 +00:00
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
+								# install colossalai
 								pip install .
 								```
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								By default, we do not compile CUDA/C++ kernels. ColossalAI will build them during runtime.
 								If you want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
 								```shell
-												[builder] correct readme (#2375)

* [example] add google doc for benchmark results of GPT

* add tencet doc

* [example] gpt, shard init on all processes

* polish comments

* polish code

* [builder] update readme
											
										
										
											2023-01-06 08:32:26 +00:00
+								CUDA_EXT=1 pip install .
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
+								```
-												[workflow] supported test on CUDA 10.2 (#3841)


											
										
										
											2023-05-25 06:14:34 +00:00
+								For Users with CUDA 10.2, you can still build ColossalAI from source. However, you need to manually download the cub library and copy it to the corresponding directory.
 								```bash
 								# clone the repository
 								git clone https://github.com/hpcaitech/ColossalAI.git
 								cd ColossalAI
 								# download the cub library
 								wget https://github.com/NVIDIA/cub/archive/refs/tags/1.8.0.zip
 								unzip 1.8.0.zip
 								cp -r cub-1.8.0/cub/ colossalai/kernel/cuda_native/csrc/kernels/include/
 								# install
 								CUDA_EXT=1 pip install .
 								```
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
-												add badge and contributor list

											
										
										
											2022-03-04 10:04:51 +00:00
-												added docker documentation (#152)


											
										
										
											2022-01-18 05:35:18 +00:00
+								## Use Docker
-												[workflow] polish readme and dockerfile (#1165)

* [workflow] polish readme and dockerfile

* polish
											
										
										
											2022-06-23 07:12:15 +00:00
+								### Pull from DockerHub
 								You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.
 								### Build On Your Own
-												added docker documentation (#152)


											
										
										
											2022-01-18 05:35:18 +00:00
+								Run the following command to build a docker image from Dockerfile provided.
-												[doc] update docker instruction (#1020)


											
										
										
											2022-05-24 09:51:50 +00:00
+								> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
 								> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.
-												[workflow] polish readme and dockerfile (#1165)

* [workflow] polish readme and dockerfile

* polish
											
										
										
											2022-06-23 07:12:15 +00:00
-												added docker documentation (#152)


											
										
										
											2022-01-18 05:35:18 +00:00
+								```bash
 								cd ColossalAI
 								docker build -t colossalai ./docker
 								```
 								Run the following command to start the docker container in interactive mode.
 								```bash
 								docker run -ti --gpus all --rm --ipc=host colossalai bash
 								```
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
-												add badge and contributor list

											
										
										
											2022-03-04 10:04:51 +00:00
 								## Community
 								Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
 								[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
-												[doc] add env scope (#2933)


											
										
										
											2023-02-28 07:39:51 +00:00
+								and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
-												add badge and contributor list

											
										
										
											2022-03-04 10:04:51 +00:00
-												[doc] fix typo (#3222)

* [doc] fix typo

* [doc] fix typo
											
										
										
											2023-03-24 05:33:35 +00:00
+								## Contributing
-												[doc] add community contribution guide (#3153)

* [doc] update contribution guide

* [doc] update contribution guide

* [doc] add community contribution guide
											
										
										
											2023-03-17 03:07:24 +00:00
+								Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!
-												updated readme and change log (#224)


											
										
										
											2022-02-14 09:22:48 +00:00
-												[doc] add community contribution guide (#3153)

* [doc] update contribution guide

* [doc] update contribution guide

* [doc] add community contribution guide
											
										
										
											2023-03-17 03:07:24 +00:00
+								You may contact us or participate in the following ways:
 . [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks!
 . Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md)
 . Send your official proposal to email contact@hpcaitech.com
-												add badge and contributor list

											
										
										
											2022-03-04 10:04:51 +00:00
 								Thanks so much to all of our amazing contributors!
-												updated readme and change log (#224)


											
										
										
											2022-02-14 09:22:48 +00:00
-												[doc] updated contributor list (#3474)


											
										
										
											2023-04-06 09:47:59 +00:00
+								<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors">
 								  <img src="https://contrib.rocks/image?repo=hpcaitech/ColossalAI"  width="800px"/>
 								</a>
-												add badge and contributor list

											
										
										
											2022-03-04 10:04:51 +00:00
-												updated readme and change log (#224)


											
										
										
											2022-02-14 09:22:48 +00:00
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
-												[doc] added documentation for CI/CD (#2420)

* [doc] added documentation for CI/CD

* polish markdown

* polish markdown

* polish markdown
											
										
										
											2023-01-10 14:30:32 +00:00
+								## CI/CD
 								We leverage the power of [GitHub Actions](https://github.com/features/actions) to automate our development, release and deployment workflows. Please check out this [documentation](.github/workflows/README.md) on how the automated workflows are operated.
-												fixed some typos in the documents, added blog link and paper author information in README

											
										
										
											2021-11-03 08:07:28 +00:00
+								## Cite Us
-												Migrated project

											
										
										
											2021-10-28 16:21:23 +00:00
-												[refactor] update docs (#3174)

* refactor: README-zh-Hans

* refactor: REFERENCE

* docs: update paths in README
											
										
										
											2023-03-20 02:52:01 +00:00
+								This project is inspired by some related projects (some by our team and some by other organizations). We would like to credit these amazing projects as listed in the [Reference List](./docs/REFERENCE.md).
-												[doc] added reference to related works (#2994)

* [doc] added reference to related works

* polish code
											
										
										
											2023-03-04 09:32:22 +00:00
 								To cite this project, you can use the following BibTeX citation.
-												fixed some typos in the documents, added blog link and paper author information in README

											
										
										
											2021-11-03 08:07:28 +00:00
+								```
-												[doc] updated paper citation (#5131)


											
										
										
											2023-11-29 02:47:51 +00:00
+								@inproceedings{10.1145/3605573.3605613,
 								author = {Li, Shenggui and Liu, Hongxin and Bian, Zhengda and Fang, Jiarui and Huang, Haichen and Liu, Yuliang and Wang, Boxiang and You, Yang},
 								title = {Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
 								year = {2023},
 								isbn = {9798400708435},
 								publisher = {Association for Computing Machinery},
 								address = {New York, NY, USA},
 								url = {https://doi.org/10.1145/3605573.3605613},
 								doi = {10.1145/3605573.3605613},
 								abstract = {The success of Transformer models has pushed the deep learning model scale to billions of parameters, but the memory limitation of a single GPU has led to an urgent need for training on multi-GPU clusters. However, the best practice for choosing the optimal parallel strategy is still lacking, as it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism and is integrated with heterogeneous training and zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.},
 								booktitle = {Proceedings of the 52nd International Conference on Parallel Processing},
 								pages = {766–775},
 								numpages = {10},
 								keywords = {datasets, gaze detection, text tagging, neural networks},
 								location = {Salt Lake City, UT, USA},
 								series = {ICPP '23}
-												fixed some typos in the documents, added blog link and paper author information in README

											
										
										
											2021-11-03 08:07:28 +00:00
+								}
 								```
-												update README and images path (#384)


											
										
										
											2022-03-11 05:53:38 +00:00
-												[format] applied code formatting on changed files in pull request 4726 (#4727)

Co-authored-by: github-actions <github-actions@github.com>
											
										
										
											2023-09-15 05:17:32 +00:00
+								Colossal-AI has been accepted as official tutorial by top conferences [NeurIPS](https://nips.cc/), [SC](https://sc22.supercomputing.org/), [AAAI](https://aaai.org/Conferences/AAAI-23/),
-												[doc] add Series A Funding and NeurIPS news (#4377)

* [doc] add Series A Funding and NeurIPS news

* [kernal] fix mha kernal

* [CI] skip moe

* [CI] fix requirements
											
										
										
											2023-08-04 09:42:07 +00:00
+								[PPoPP](https://ppopp23.sigplan.org/), [CVPR](https://cvpr2023.thecvf.com/), [ISC](https://www.isc-hpc.com/), [NVIDIA GTC](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-S51482/) ,etc.
-												[doc] update opt and tutorial links (#2509)


											
										
										
											2023-01-20 09:29:13 +00:00
-												Update README.md
											
										
										
											2022-07-17 02:00:59 +00:00
+								<p align="right">(<a href="#top">back to top</a>)</p>