ColossalAI/examples
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
* test: add more p2p tests

* fix: remove send_forward_recv_forward as p2p op list need to use the same group

* fix: make send and receive atomic

* feat: update P2PComm fn

* feat: add metadata cache in 1f1b

* feat: add metadata cache in interleaved pp

* feat: modify is_xx_stage fn

* revert: add _broadcast_object_list

* feat: add interleaved pp in llama policy

* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2023-12-22 10:44:00 +08:00
..
community updated c++17 compiler flags (#4983) 2023-10-27 18:19:56 +08:00
images [doc] update slack link (#4823) 2023-09-27 17:37:39 +08:00
inference [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
language [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) 2023-12-22 10:44:00 +08:00
tutorial [fix] fix weekly runing example (#4787) 2023-09-25 16:19:33 +08:00
README.md [doc] update slack link (#4823) 2023-09-27 17:37:39 +08:00

README.md

Colossal-AI Examples

Table of Contents

Overview

This folder provides several examples accelerated by Colossal-AI. Folders such as images and language include a wide range of deep learning tasks and applications. The community folder aim to create a collaborative platform for developers to contribute exotic features built on top of Colossal-AI. The tutorial folder is for everyone to quickly try out the different features in Colossal-AI.

You can find applications such as Chatbot, AIGC and Biomedicine in the Applications directory.

Folder Structure

└─ examples
  └─ images
      └─ vit
        └─ test_ci.sh
        └─ train.py
        └─ README.md
      └─ ...
  └─ ...

Invitation to open-source contribution

Referring to the successful attempts of BLOOM and Stable Diffusion, any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!

You may contact us or participate in the following ways:

  1. Leaving a Star to show your like and support. Thanks!
  2. Posting an issue, or submitting a PR on GitHub follow the guideline in Contributing.
  3. Join the Colossal-AI community on Slack, and WeChat(微信) to share your ideas.
  4. Send your official proposal to email contact@hpcaitech.com

Thanks so much to all of our amazing contributors!

Integrate Your Example With Testing

Regular checks are important to ensure that all examples run without apparent bugs and stay compatible with the latest API. Colossal-AI runs workflows to check for examples on a on-pull-request and weekly basis. When a new example is added or changed, the workflow will run the example to test whether it can run. Moreover, Colossal-AI will run testing for examples every week.

Therefore, it is essential for the example contributors to know how to integrate your example with the testing workflow. Simply, you can follow the steps below.

  1. Create a script called test_ci.sh in your example folder
  2. Configure your testing parameters such as number steps, batch size in test_ci.sh, e.t.c. Keep these parameters small such that each example only takes several minutes.
  3. Export your dataset path with the prefix /data and make sure you have a copy of the dataset in the /data/scratch/examples-data directory on the CI machine. Community contributors can contact us via slack to request for downloading the dataset on the CI machine.
  4. Implement the logic such as dependency setup and example execution

Community Dependency

We are happy to introduce the following nice community dependency repos that are powered by Colossal-AI: