diff --git a/.compatibility b/.compatibility new file mode 100644 index 000000000..c8ac4083d --- /dev/null +++ b/.compatibility @@ -0,0 +1,3 @@ +1.12.0-11.3.0 +1.11.0-11.3.0 +1.10.1-11.3.0 diff --git a/.github/workflows/README.md b/.github/workflows/README.md index 65017a397..bc1f8504d 100644 --- a/.github/workflows/README.md +++ b/.github/workflows/README.md @@ -14,6 +14,7 @@ - [Dispatch Example Test](#dispatch-example-test) - [Compatibility Test](#compatibility-test) - [User Friendliness](#user-friendliness) + - [Configuration](#configuration) - [Progress Log](#progress-log) ## Overview @@ -37,30 +38,32 @@ In the section below, we will dive into the details of different workflows avail ### Regular Checks -| Workflow Name | File name | Description | -| ----------------------- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------- | -| `Test example` | `auto_example_check.yml` | This workflow will test all examples every Sunday | -| `Build on 8 GPUs` | `build_gpu_8.yml` | This workflow will run the unit tests everyday with 8 GPUs. | -| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. | -| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. | +| Workflow Name | File name | Description | +| ----------------------- | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `Test example` | `auto_example_check.yml` | This workflow will test all examples every Sunday | +| `Compatibility Test` | `auto_compatibility_test.yml` | This workflow will check the compatiblity of Colossal-AI against PyTorch and CUDA every Sunday. The PyTorch and CUDA versions are specified in `.compatibility`. | +| `Build on 8 GPUs` | `build_gpu_8.yml` | This workflow will run the unit tests everyday with 8 GPUs. | +| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. | +| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. | ### Release -| Workflow Name | File name | Description | -| --------------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------- | -| `Draft GitHub Release Post` | `draft_github_release_post.yml` | Compose a GitHub release post draft based on the commit history. Triggered when `version.txt` is updated. | -| `Release to PyPI` | `release_pypi.yml` | Build and release the wheel to PyPI. Triggered when `version.txt` is updated. | -| `Release Nightly to PyPI` | `release_nightly.yml` | Build and release the nightly wheel to PyPI as `colossalai-nightly`. Automatically executed every Sunday. | -| `Release Docker` | `release_docker.yml` | Build and release the Docker image to DockerHub. Triggered when `version.txt` is updated. | -| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. Manually dispatched. See more details in the next section. | +| Workflow Name | File name | Description | +| --------------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | +| `Draft GitHub Release Post` | `draft_github_release_post.yml` | Compose a GitHub release post draft based on the commit history. Triggered when the change of `version.txt` is merged. | +| `Release to PyPI` | `release_pypi.yml` | Build and release the wheel to PyPI. Triggered when the change of `version.txt` is merged. | +| `Release Nightly to PyPI` | `release_nightly.yml` | Build and release the nightly wheel to PyPI as `colossalai-nightly`. Automatically executed every Sunday. | +| `Release Docker` | `release_docker.yml` | Build and release the Docker image to DockerHub. Triggered when the change of `version.txt` is merged. | +| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. Manually dispatched. See more details in the next section. | +| `Auto Compatibility Test` | `auto_compatibility_test.yml` | Check Colossal-AI's compatiblity against the PyTorch and CUDA version specified in `.compatibility`. Triggered when `version.txt` is changed in a PR. | ### Manual Dispatch -| Workflow Name | File name | Description | -| ----------------------- | ---------------------------- | ------------------------------------------------------ | -| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. | -| `Dispatch Example Test` | `dispatch_example_check.yml` | Manually test a specified example. | -| `Compatiblity Test` | `compatiblity_test.yml` | Test PyTorch and Python Compatibility. | +| Workflow Name | File name | Description | +| ---------------------------- | -------------------------------- | ------------------------------------------------------ | +| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. | +| `Dispatch Example Test` | `dispatch_example_check.yml` | Manually test a specified example. | +| `Dispatch Compatiblity Test` | `dispatch_compatiblity_test.yml` | Test PyTorch and Python Compatibility. | Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow. I will provide the details of each workflow below. @@ -93,6 +96,15 @@ Parameters: | ----------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | `issue-translate` | `translate_comment.yml` | This workflow is triggered when a new issue comment is created. The comment will be translated into English if not written in English. | + +## Configuration + +This section lists the files used to configure the workflow. + +1. `.compatibility` + +This `.compatibility` file is to tell GitHub Actions which PyTorch and CUDA versions to test against. Each line in the file is in the format `${torch-version}-${cuda-version}`, which is a tag for Docker image. Thus, this tag must be present in the [docker registry](https://hub.docker.com/r/pytorch/conda-cuda) so as to perform the test. + ## Progress Log - [x] unit testing @@ -112,9 +124,9 @@ Parameters: - [x] check on PR - [x] regular check - [x] manual dispatch -- [ ] compatiblity check +- [x] compatiblity check - [x] manual dispatch - - [ ] auto test when release + - [x] auto test when release - [x] helpers - [x] comment translation - [x] submodule update diff --git a/.github/workflows/auto_compatibility_test.yml b/.github/workflows/auto_compatibility_test.yml new file mode 100644 index 000000000..4b026c63e --- /dev/null +++ b/.github/workflows/auto_compatibility_test.yml @@ -0,0 +1,74 @@ +name: Compatibility Test + +on: + pull_request: + paths: + - 'version.txt' + - '.compatibility' + # run at 03:00 of every Sunday(singapore time) so here is UTC time Saturday 16:00 + schedule: + - cron: '0 19 * * 6' + +jobs: + matrix_preparation: + name: Prepare Container List + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v3 + - id: set-matrix + run: | + IFS=',' + DOCKER_IMAGE=() + + while read tag; do + DOCKER_IMAGE+=("\"hpcaitech/pytorch-cuda:${tag}\"") + done <.compatibility + + container=$( IFS=',' ; echo "${DOCKER_IMAGE[*]}" ) + container="[${container}]" + echo "$container" + echo "::set-output name=matrix::{\"container\":$(echo "$container")}" + + build: + name: Test for PyTorch Compatibility + needs: matrix_preparation + if: github.repository == 'hpcaitech/ColossalAI' + runs-on: [self-hosted, gpu] + strategy: + fail-fast: false + matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}} + container: + image: ${{ matrix.container }} + options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 + timeout-minutes: 120 + steps: + - name: Install dependencies + run: | + pip install -U pip setuptools wheel --user + - uses: actions/checkout@v2 + with: + repository: hpcaitech/TensorNVMe + ssh-key: ${{ secrets.SSH_KEY_FOR_CI }} + path: TensorNVMe + - name: Install tensornvme + run: | + cd TensorNVMe + conda install cmake + pip install -r requirements.txt + pip install -v . + - uses: actions/checkout@v2 + with: + ssh-key: ${{ secrets.SSH_KEY_FOR_CI }} + - name: Install Colossal-AI + run: | + pip install -v --no-cache-dir . + pip install -r requirements/requirements-test.txt + - name: Unit Testing + run: | + PYTHONPATH=$PWD pytest tests + env: + DATA: /data/scratch/cifar-10 + NCCL_SHM_DISABLE: 1 + LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 diff --git a/.github/workflows/compatibility_test.yml b/.github/workflows/dispatch_compatibility_test.yml similarity index 98% rename from .github/workflows/compatibility_test.yml rename to .github/workflows/dispatch_compatibility_test.yml index eadd07886..ac5669c6f 100644 --- a/.github/workflows/compatibility_test.yml +++ b/.github/workflows/dispatch_compatibility_test.yml @@ -1,4 +1,4 @@ -name: Compatibility Test +name: Dispatch Compatibility Test on: workflow_dispatch: