ColossalAI/colossalai
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
* [feat] Add distributed lamb; minor fixes in DeviceMesh (#5476)

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [hotfix] Improve tester precision by removing ZeRO on vanilla lamb (#5576)

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [optim] add distributed came (#5526)

* test CAME under LowLevelZeroOptimizer wrapper

* test CAME TP row and col pass

* test CAME zero pass

* came zero add master and worker param id convert

* came zero test pass

* came zero test pass

* test distributed came passed

* reform code, Modify some expressions and add comments

* minor fix of test came

* minor fix of dist_came and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix of dist_came and test

* rebase dist-optim

* rebase dist-optim

* fix remaining comments

* add test dist came using booster api

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [optim] Distributed Adafactor (#5484)

* [feature] solve conflict; update optimizer readme;

* [feature] update optimize readme;

* [fix] fix testcase;

* [feature] Add transformer-bert to testcase;solve a bug related to indivisible shape (induction in use_zero and tp is row parallel);

* [feature] Add transformers_bert model zoo in testcase;

* [feature] add user documentation to docs/source/feature.

* [feature] add API Reference & Sample to optimizer Readme; add state check for bert exam;

* [feature] modify user documentation;

* [fix] fix readme format issue;

* [fix] add zero=0 in testcase; cached augment in dict;

* [fix] fix percision issue;

* [feature] add distributed rms;

* [feature] remove useless comment in testcase;

* [fix] Remove useless test; open zero test; remove fp16 test in bert exam;

* [feature] Extract distributed rms function;

* [feature] add booster + lowlevelzeroPlugin in test;

* [feature] add Start_with_booster_API case in md; add Supporting Information in md;

* [fix] Also remove state movement in base adafactor;

* [feature] extract factor function;

* [feature] add LowLevelZeroPlugin test;

* [fix] add tp=False and zero=True in logic;

* [fix] fix use zero logic;

* [feature] add row residue logic in column parallel factor;

* [feature] add check optim state func;

* [feature] Remove duplicate logic;

* [feature] update optim state check func and percision test bug;

* [fix] update/fix optim state; Still exist percision issue;

* [fix] Add use_zero check in _rms; Add plugin support info in Readme; Add Dist Adafactor init Info;

* [feature] removed print & comments in utils;

* [feature] uodate Readme;

* [feature] add LowLevelZeroPlugin test with Bert model zoo;

* [fix] fix logic in _rms;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix] remove comments in testcase;

* [feature] add zh-Han Readme;

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] refractor dist came; fix percision error; add low level zero test with bert model zoo; (#5676)

* [feature] daily update;

* [fix] fix dist came;

* [feature] refractor dist came; fix percision error; add low level zero test with bert model zoo;

* [fix] open rms; fix low level zero test; fix dist came test function name;

* [fix] remove redundant test;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] Add Galore (Adam, Adafactor) and distributed GaloreAdamW8bit (#5570)

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

* update comments

* add initial distributed galore

* add initial distributed galore

* add galore set param utils; change setup_distributed interface

* projected grad precision passed

* basic precision tests passed

* tests passed; located svd precision issue in fwd-bwd; banned these tests

* Plugin DP + TP tests passed

* move get_shard_dim to d_tensor

* add comments

* remove useless files

* remove useless files

* fix zero typo

* improve interface

* remove moe changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import

* fix deepcopy

* update came & adafactor to main

* fix param map

* fix typo

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Hotfix] Remove one buggy test case from dist_adafactor for now (#5692)


Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: chongqichuizi875 <107315010+chongqichuizi875@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: duanjunwen <54985467+duanjunwen@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
2024-05-14 13:52:45 +08:00
..
_C [setup] support pre-build and jit-build of cuda kernels (#2374) 2023-01-06 20:50:26 +08:00
_analyzer [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) 2024-04-18 18:15:50 +08:00
accelerator [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) 2024-03-05 21:52:30 +08:00
amp [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
auto_parallel [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00
autochunk [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) 2024-04-18 18:15:50 +08:00
booster [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
checkpoint_io [lora] add lora APIs for booster, support lora for TorchDDP (#4981) 2024-04-28 10:51:27 +08:00
cli [devops] fix extention building (#5427) 2024-03-05 15:35:54 +08:00
cluster [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
context [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) 2024-04-25 14:45:52 +08:00
device [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
fx [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) 2024-04-18 18:15:50 +08:00
inference [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00
interface [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
kernel [coloattention]modify coloattention (#5627) 2024-04-25 10:47:14 +08:00
lazy [doc] add lazy init docs (#4808) 2023-09-27 10:24:04 +08:00
legacy [hotfix] fix inference typo (#5438) 2024-05-13 21:06:44 +08:00
logging [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
moe [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) 2024-03-05 21:52:30 +08:00
nn [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
pipeline [LowLevelZero] low level zero support lora (#5153) 2024-04-28 10:51:27 +08:00
quantization [Feature] qlora support (#5586) 2024-04-28 10:51:27 +08:00
shardformer [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) 2024-05-10 15:33:39 +08:00
tensor [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
testing [shardformer] refactor embedding resize (#5603) 2024-04-18 16:10:18 +08:00
utils Merge pull request #5310 from hpcaitech/feature/npu 2024-01-29 13:49:39 +08:00
zero [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 2024-05-14 13:52:45 +08:00
__init__.py [devops] remove post commit ci (#5566) 2024-04-08 15:09:40 +08:00
initialize.py [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00