History

Hongxin Liu 7f8b16635b [misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2024-04-29 10:40:11 +08:00
..
auto_parallel	[misc] refactor launch API and tensor constructor (#5666 )	2024-04-29 10:40:11 +08:00
fastfold	Automated submodule synchronization (#4217 )	2023-07-12 17:35:58 +08:00
hybrid_parallel	[hotfix] quick fixes to make legacy tutorials runnable (#5559 )	2024-04-07 12:06:27 +08:00
large_batch_optimizer	[hotfix] quick fixes to make legacy tutorials runnable (#5559 )	2024-04-07 12:06:27 +08:00
new_api	[misc] refactor launch API and tensor constructor (#5666 )	2024-04-29 10:40:11 +08:00
opt	[misc] refactor launch API and tensor constructor (#5666 )	2024-04-29 10:40:11 +08:00
sequence_parallel	[hotfix] quick fixes to make legacy tutorials runnable (#5559 )	2024-04-07 12:06:27 +08:00
.gitignore	[tutorial] added missing dummy dataloader (#1944 )	2022-11-14 04:09:03 -06:00
README.md	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
download_cifar10.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
requirements.txt	[example] add example requirement (#2345 )	2023-01-06 09:03:29 +08:00

README.md

Colossal-AI Tutorial Hands-on

This path is an abbreviated tutorial prepared for specific activities and may not be maintained in real time. For use of Colossal-AI, please refer to other examples and documents.

Introduction

Welcome to the Colossal-AI tutorial, which has been accepted as official tutorials by top conference NeurIPS, SC, AAAI, PPoPP, CVPR, ISC, NVIDIA GTC ,etc.

Colossal-AI, a unified deep learning system for the big model era, integrates many advanced technologies such as multi-dimensional tensor parallelism, sequence parallelism, heterogeneous memory management, large-scale optimization, adaptive task scheduling, etc. By using Colossal-AI, we could help users to efficiently and quickly deploy large AI model training and inference, reducing large AI model training budgets and scaling down the labor cost of learning and deployment.

🚀 Quick Links

Colossal-AI | Paper | Documentation | Issue | Slack

Table of Content

Multi-dimensional Parallelism [code] [video]
Sequence Parallelism [code] [video]
Large Batch Training Optimization [code] [video]
Automatic Parallelism [code] [video]
Fine-tuning and Inference for OPT [code] [video]
Optimized AlphaFold [code] [video]
Optimized Stable Diffusion [code] [video]
ColossalChat: Cloning ChatGPT with a Complete RLHF Pipeline [code] [blog] [demo] [video]

Discussion

Discussion about the Colossal-AI project is always welcomed! We would love to exchange ideas with the community to better help this project grow. If you think there is a need to discuss anything, you may jump to our Slack.

If you encounter any problem while running these tutorials, you may want to raise an issue in this repository.

🛠️ Setup environment

[video] You should use conda to create a virtual environment, we recommend python 3.8, e.g. conda create -n colossal python=3.8. This installation commands are for CUDA 11.3, if you have a different version of CUDA, please download PyTorch and Colossal-AI accordingly. You can refer to the Installation to set up your environment.

You can run colossalai check -i to verify if you have correctly set up your environment 🕹️.

If you encounter messages like please install with cuda_ext, do let me know as it could be a problem of the distribution wheel. 😥

Then clone the Colossal-AI repository from GitHub.

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI/examples/tutorial