ColossalAI/docs
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
* [feat] Add distributed lamb; minor fixes in DeviceMesh (#5476)

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [hotfix] Improve tester precision by removing ZeRO on vanilla lamb (#5576)

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [optim] add distributed came (#5526)

* test CAME under LowLevelZeroOptimizer wrapper

* test CAME TP row and col pass

* test CAME zero pass

* came zero add master and worker param id convert

* came zero test pass

* came zero test pass

* test distributed came passed

* reform code, Modify some expressions and add comments

* minor fix of test came

* minor fix of dist_came and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix of dist_came and test

* rebase dist-optim

* rebase dist-optim

* fix remaining comments

* add test dist came using booster api

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [optim] Distributed Adafactor (#5484)

* [feature] solve conflict; update optimizer readme;

* [feature] update optimize readme;

* [fix] fix testcase;

* [feature] Add transformer-bert to testcase;solve a bug related to indivisible shape (induction in use_zero and tp is row parallel);

* [feature] Add transformers_bert model zoo in testcase;

* [feature] add user documentation to docs/source/feature.

* [feature] add API Reference & Sample to optimizer Readme; add state check for bert exam;

* [feature] modify user documentation;

* [fix] fix readme format issue;

* [fix] add zero=0 in testcase; cached augment in dict;

* [fix] fix percision issue;

* [feature] add distributed rms;

* [feature] remove useless comment in testcase;

* [fix] Remove useless test; open zero test; remove fp16 test in bert exam;

* [feature] Extract distributed rms function;

* [feature] add booster + lowlevelzeroPlugin in test;

* [feature] add Start_with_booster_API case in md; add Supporting Information in md;

* [fix] Also remove state movement in base adafactor;

* [feature] extract factor function;

* [feature] add LowLevelZeroPlugin test;

* [fix] add tp=False and zero=True in logic;

* [fix] fix use zero logic;

* [feature] add row residue logic in column parallel factor;

* [feature] add check optim state func;

* [feature] Remove duplicate logic;

* [feature] update optim state check func and percision test bug;

* [fix] update/fix optim state; Still exist percision issue;

* [fix] Add use_zero check in _rms; Add plugin support info in Readme; Add Dist Adafactor init Info;

* [feature] removed print & comments in utils;

* [feature] uodate Readme;

* [feature] add LowLevelZeroPlugin test with Bert model zoo;

* [fix] fix logic in _rms;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix] remove comments in testcase;

* [feature] add zh-Han Readme;

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] refractor dist came; fix percision error; add low level zero test with bert model zoo; (#5676)

* [feature] daily update;

* [fix] fix dist came;

* [feature] refractor dist came; fix percision error; add low level zero test with bert model zoo;

* [fix] open rms; fix low level zero test; fix dist came test function name;

* [fix] remove redundant test;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] Add Galore (Adam, Adafactor) and distributed GaloreAdamW8bit (#5570)

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

* update comments

* add initial distributed galore

* add initial distributed galore

* add galore set param utils; change setup_distributed interface

* projected grad precision passed

* basic precision tests passed

* tests passed; located svd precision issue in fwd-bwd; banned these tests

* Plugin DP + TP tests passed

* move get_shard_dim to d_tensor

* add comments

* remove useless files

* remove useless files

* fix zero typo

* improve interface

* remove moe changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import

* fix deepcopy

* update came & adafactor to main

* fix param map

* fix typo

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Hotfix] Remove one buggy test case from dist_adafactor for now (#5692)


Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: chongqichuizi875 <107315010+chongqichuizi875@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: duanjunwen <54985467+duanjunwen@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
2024-05-14 13:52:45 +08:00
..
source [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
README-zh-Hans.md [news] llama3 and open-sora v1.1 (#5655) 2024-04-26 15:36:37 +08:00
README.md [legacy] clean up legacy code (#4743) 2023-09-18 16:31:06 +08:00
REFERENCE.md fix typo docs/ 2023-05-24 13:57:43 +08:00
conda-doc-test-deps.yml [workflow] supported conda package installation in doc test (#3028) 2023-03-07 14:21:26 +08:00
requirements-doc-test.txt [test] refactor tests with spawn (#3452) 2023-04-06 14:51:35 +08:00
sidebars.json [doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) 2023-10-10 08:18:55 +00:00
versions.json [doc] add opt service doc (#2747) 2023-02-16 15:45:26 +08:00

README.md

๐Ÿ“• Documentation

๐Ÿ”— Table of Contents

๐Ÿ“ Overview

We evaluated various existing solutions for documentation in the community and discussed their advantages and disadvantages in the issue #2651. Therefore, we propose to build a more modern and robust documentation system by integrating the Sphinx autodoc function and the Docusaurus framework.

๐Ÿ—บ Module Structure

- docs
    - source
        - en
        - zh-Hans
    - sidebars.json
    - versions.json
    - requirements-doc-test.txt

The documentation module structure is shown above:

  1. source: This folder contains multi-language documentation files.
  2. sidebars.json: The sidebars.json defines the table of content for the tutorials. You need to update this file when a new doc is added/deleted.
  3. versions.json: The versions.json in the main branch in the latest commit will be used to control the versions to be displayed on our website

๐Ÿงฑ Our Documentation System

We believe that the combination of the existing systems can yield several advantages such as simplicity, usability and maintainability:

  1. Support Markdown. We believe is a more popular language for writing documentation compared to RST.
  2. Support Autodoc. It can automatically generate documentation from the docstrings in the source code provided by Sphinx.
  3. Support elegant and modern UI, which is provided by Docusaurus.
  4. Support MDX for more flexible and powerful documentation, which is provided by Docusaurus.
  5. Support hosting blogs/project home page/other pages besides the documentation, which is provided by Docusaurus.

Therefore, we have built the ColossalAI-Documentation repository to integrate the features above.

๐ŸŽŠ Contribution

You can contribute to the documentation by directly setting up a Pull Request towards the docs/source folder. There are several guidelines for documentation contribution.

  1. The documentation is written in Markdown. You can refer to the Markdown Guide for the syntax.
  2. You must ensure that the documentation exists for all languages. You can refer to the Adding a New Documentation for more details.
  3. You must provide a test command for your documentation, please see Doc Testing for more details.
  4. You can embed your docstring in your markdown, please see Auto Documentation for more details.

๐Ÿ–Š Adding a New Documentation

You can add a Markdown file to the docs/source folder. You need to ensure that multi-language is supported in your PR. Let's assume that you want to add a file called your_doc.md`, your file structure will look like this.

- docs
  - source
    - en
        - your_doc.md  # written in English
    - zh-Hans
        - your_doc.md  # written in Chinese
  - sidebars.json  # add your documentation file name here

Meanwhile, you need to ensure the sidebars.json is updated such that it contains your documentation file. Our CI will check whether documentation exists for all languages and can be used to build the website successfully.

๐Ÿงน Doc Testing

Every documentation is tested to ensure it works well. You need to add the following line to the bottom of your file and replace $command with the actual command. Do note that the markdown will be converted into a Python file. Assuming you have a demo.md file, the test file generated will be demo.py. Therefore, you should use demo.py in your command, e.g. python demo.py.

<!-- doc-test-command: $command  -->

Meanwhile, only code labeled as a Python code block will be considered for testing.

    ```python
    print("hello world")
    ```

Lastly, if you want to skip some code, you just need to add the following annotations to tell docer to discard the wrapped code for testing.

<!--- doc-test-ignore-start -->

    ```python
    print("hello world")
    ```

<!--- doc-test-ignore-end -->

If you have any dependency required, please add it to requirements-doc-test.txt for pip and conda-doc-test-deps.yml for Conda.

๐Ÿ’‰ Auto Documentation

Lastly, you may want to include the API documentation for a class/function in your documentation for reference. We support autodoc to extract the docstring and transform it into a Web element for an elegant display. You just need to add {{ autodoc:<mod-name> }} in your markdown as a single line. An example is given below and you can see the outcome in this PR.

{{ autodoc:colossalai.legacy.amp.apex_amp.convert_to_apex_amp }}