![]() * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * [shardformer] fix opt test hanging * fix * test * test * [shardformer] zero1+pp and the corresponding tests (#4517) * pause * finish pp+zero1 * Update test_shard_vit.py * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom * [shardformer] fix emerged bugs after updating transformers (#4526) * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] Add overlap support for gpt2 (#4535) * add overlap support for gpt2 * remove unused code * remove unused code * [shardformer] support pp+tp+zero1 tests (#4531) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] fix submodule replacement bug when enabling pp (#4544) * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * rebase feature/shardformer * update pipeline * [shardformer] fix * [shardformer] fix * [shardformer] bert finetune fix * [shardformer] add all_reduce operation to loss add all_reduce operation to loss * [shardformer] make compatible with pytree. make compatible with pytree. * [shardformer] disable tp disable tp * [shardformer] add 3d plugin to ci test * [shardformer] update num_microbatches to None * [shardformer] update microbatchsize * [shardformer] update assert * update scheduler * update scheduler --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn> |
||
---|---|---|
.. | ||
community | ||
images | ||
language | ||
tutorial | ||
README.md |
README.md
Colossal-AI Examples
Table of Contents
Overview
This folder provides several examples accelerated by Colossal-AI.
Folders such as images
and language
include a wide range of deep learning tasks and applications.
The community
folder aim to create a collaborative platform for developers to contribute exotic features built on top of Colossal-AI.
The tutorial
folder is for everyone to quickly try out the different features in Colossal-AI.
You can find applications such as Chatbot, AIGC and Biomedicine in the Applications directory.
Folder Structure
└─ examples
└─ images
└─ vit
└─ test_ci.sh
└─ train.py
└─ README.md
└─ ...
└─ ...
Invitation to open-source contribution
Referring to the successful attempts of BLOOM and Stable Diffusion, any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!
You may contact us or participate in the following ways:
- Leaving a Star ⭐ to show your like and support. Thanks!
- Posting an issue, or submitting a PR on GitHub follow the guideline in Contributing.
- Join the Colossal-AI community on Slack, and WeChat(微信) to share your ideas.
- Send your official proposal to email contact@hpcaitech.com
Thanks so much to all of our amazing contributors!
Integrate Your Example With Testing
Regular checks are important to ensure that all examples run without apparent bugs and stay compatible with the latest API. Colossal-AI runs workflows to check for examples on a on-pull-request and weekly basis. When a new example is added or changed, the workflow will run the example to test whether it can run. Moreover, Colossal-AI will run testing for examples every week.
Therefore, it is essential for the example contributors to know how to integrate your example with the testing workflow. Simply, you can follow the steps below.
- Create a script called
test_ci.sh
in your example folder - Configure your testing parameters such as number steps, batch size in
test_ci.sh
, e.t.c. Keep these parameters small such that each example only takes several minutes. - Export your dataset path with the prefix
/data
and make sure you have a copy of the dataset in the/data/scratch/examples-data
directory on the CI machine. Community contributors can contact us via slack to request for downloading the dataset on the CI machine. - Implement the logic such as dependency setup and example execution
Community Dependency
We are happy to introduce the following nice community dependency repos that are powered by Colossal-AI: