Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
Hongxin Liu
e5ce4c8ea6
[npu] add npu support for gemini and zero ( #5067 )
...
* [npu] setup device utils (#5047 )
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052 )
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065 )
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
2023-11-20 16:12:41 +08:00
littsk
83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin ( #4837 )
...
* Add clip_grad_norm for hibrid_parallel_plugin
* polish code
* add unittests
* Move tp to a higher-level optimizer interface.
* bug fix
* polish code
2023-10-12 11:32:37 +08:00
Baizhou Zhang
c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic ( #4758 )
...
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
2023-09-20 18:29:37 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Baizhou Zhang
0ceec8f9a9
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file ( #4354 )
...
* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test
* merge tests of PP/DP/TP combinations into one test file
* fix bug when sync grad for dp in HybridPlugin
* update supported precisions for 3DPlugin/fix bug when shifting tp_degree
* improve the passing of lazy_init
* modify lazy_init/use sync_shared_params
2023-08-15 23:25:14 +08:00
Hongxin Liu
261eab02fb
[plugin] add 3d parallel plugin ( #4295 )
...
* [amp] add mixed precision optimizer
* [plugin] add 3d parallel plugin
* [booster] support pipeline
* [plugin] 3d parallel plugin support clip grad norm
* [shardformer] fix sharder and add plugin test
* [plugin] rename 3d parallel plugin
* [ci] support testmon core pkg change detection (#4305 )
* [hotfix] debug testmon
* [hotfix] fix llama
* [hotfix] fix p2p bugs
* [hotfix] fix requirements
2023-08-15 23:25:14 +08:00
Hongxin Liu
ae02d4e4f7
[bf16] add bf16 support ( #3882 )
...
* [bf16] add bf16 support for fused adam (#3844 )
* [bf16] fused adam kernel support bf16
* [test] update fused adam kernel test
* [test] update fused adam test
* [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860 )
* [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869 )
* [bf16] add mixed precision mixin
* [bf16] low level zero optim support bf16
* [text] update low level zero test
* [text] fix low level zero grad acc test
* [bf16] add bf16 support for gemini (#3872 )
* [bf16] gemini support bf16
* [test] update gemini bf16 test
* [doc] update gemini docstring
* [bf16] add bf16 support for plugins (#3877 )
* [bf16] add bf16 support for legacy zero (#3879 )
* [zero] init context support bf16
* [zero] legacy zero support bf16
* [test] add zero bf16 test
* [doc] add bf16 related docstring for legacy zero
2023-06-05 15:58:31 +08:00
digger yu
32f81f14d4
[NFC] fix typo colossalai/amp auto_parallel autochunk ( #3756 )
2023-05-19 13:50:00 +08:00
lucasliunju
4b95464994
[NFC] polish colossalai/amp/__init__.py code style ( #3272 )
2023-03-29 15:22:21 +08:00
Frank Lee
8518263b80
[test] fixed the triton version for testing ( #2608 )
2023-02-07 13:49:38 +08:00
HELSON
077a5cdde4
[zero] fix gradient clipping in hybrid parallelism ( #2521 )
...
* [zero] fix gradient clipping in hybrid parallelism
* [testing] change model name to avoid pytest warning
* [hotfix] fix unit testing
2023-01-29 15:09:57 +08:00
Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels ( #2374 )
...
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-01-06 20:50:26 +08:00
xyupeng
b965585d05
[NFC] polish colossalai/amp/torch_amp/torch_amp.py code style ( #2290 )
2023-01-04 15:09:57 +08:00
Ziheng Qin
3041014089
[NFC] polish colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py code style ( #2299 )
...
Co-authored-by: henryqin1997 <henryqin1997@gamil.com>
2023-01-04 15:09:57 +08:00
HELSON
5d3a2be3af
[amp] add gradient clipping for unit tests ( #2283 )
...
* [amp] add gradient clipping in unit tests
* fix bugs
2023-01-04 11:59:56 +08:00
YuliangLiu0306
f027ef7913
[hotfix] fix fp16 optimzier bug ( #2273 )
2023-01-03 16:53:43 +08:00
Jiarui Fang
355ffb386e
[builder] unified cpu_optim fused_optim inferface ( #2190 )
2022-12-23 20:57:41 +08:00
Jiarui Fang
d42afd30f8
[builder] runtime adam and fused_optim builder ( #2184 )
2022-12-23 14:14:21 +08:00
ver217
f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` ( #1971 )
2022-11-17 13:42:33 +08:00
Junming Wu
14a0b18305
[NFC] polish colossalai/amp/naive_amp/__init__.py code style ( #1905 )
2022-11-11 17:49:18 +08:00
LuGY
94329fc139
[NFC] polish colossalai/amp/apex_amp/__init__.py code style ( #1853 )
2022-11-09 14:49:42 +08:00
zbian
1559a09fb7
[NFC] polish amp.naive_amp.grad_scaler code style
2022-11-09 13:38:15 +08:00
Genghan Zhang
b25030cc07
[NFC] polish ./colossalai/amp/torch_amp/__init__.py code style ( #1836 )
2022-11-09 12:08:47 +08:00
Ziyue Jiang
5da03c936d
[NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style ( #1823 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-09 12:08:47 +08:00
Fazzie-Maqianli
399f84d8f6
[NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style ( #1819 )
2022-11-09 12:08:47 +08:00
CsRic
9623ec1b02
[NFC] polish colossalai/amp/naive_amp/_utils.py code style ( #1816 )
...
* [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714 )
* [NFC] polish colossalai/zero/sharded_param/__init__.py code style
* [NFC] polish colossalai/amp/naive_amp/_utils.py code style
Co-authored-by: shenggan <csg19971016@gmail.com>
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-11-09 12:08:47 +08:00
ver217
d068af81a3
[doc] update rst and docstring ( #1351 )
...
* update rst
* add zero docstr
* fix docstr
* remove fx.tracer.meta_patch
* fix docstr
* fix docstr
* update fx rst
* fix fx docstr
* remove useless rst
2022-07-21 15:54:53 +08:00
YuliangLiu0306
e27645376d
[hotfix]different overflow status lead to communication stuck. ( #1175 )
...
* [CLI] add CLI launcher
* Revert "[CLI] add CLI launcher"
This reverts commit df7e6506d4
.
* [hotfix]fix some bugs caused by refactored schedule.
* [hotfix]different overflow statu llead to communication stuck.
2022-06-27 09:53:57 +08:00
Frank Lee
72bd7c696b
[amp] included dict for type casting of model output ( #1102 )
2022-06-13 14:18:04 +08:00
Frank Lee
9fdebadd69
[doc] improved docstring in the amp module ( #857 )
2022-04-25 13:42:17 +08:00
HELSON
4c4388c46e
[hotfix] fix memory leak in zero ( #781 )
2022-04-18 13:57:03 +08:00
Frank Lee
a4e91bc87f
[bug] fixed grad scaler compatibility with torch 1.8 ( #735 )
2022-04-12 16:04:21 +08:00
Jiarui Fang
4d90a7b513
[refactor] zero directory ( #724 )
2022-04-11 23:13:02 +08:00
Kai Wang (Victor Kai)
b0f708dfc1
fix format ( #570 )
2022-04-06 11:40:59 +08:00
ver217
c5b488edf8
polish amp docstring ( #616 )
2022-04-01 16:09:39 +08:00
Liang Bowen
2c45efc398
html refactor ( #555 )
2022-03-31 11:36:56 +08:00
Liang Bowen
ec5086c49c
Refactored docstring to google style
2022-03-29 17:17:47 +08:00
Jiarui Fang
496cbb0760
[hotfix] fix initialize bug with zero ( #442 )
2022-03-17 13:16:22 +08:00
Frank Lee
14a7094243
fixed fp16 optimizer none grad bug ( #432 )
2022-03-16 14:35:46 +08:00
Frank Lee
e79ea44247
[fp16] refactored fp16 optimizer ( #392 )
2022-03-15 10:05:38 +08:00
Kai Wang (Victor Kai)
53bb3bcc0a
fix format ( #362 )
2022-03-11 15:50:28 +08:00
Frank Lee
3d5d64bd10
refactored grad scaler ( #338 )
2022-03-11 15:50:28 +08:00
Frank Lee
6a3188167c
set criterion as optional in colossalai initialize ( #336 )
2022-03-11 15:50:28 +08:00
Frank Lee
e17e54e32a
added buffer sync to naive amp model wrapper ( #291 )
2022-03-11 15:50:28 +08:00
Frank Lee
f5ca88ec97
fixed apex import ( #227 )
2022-02-15 11:31:13 +08:00
アマデウス
9ee197d0e9
moved env variables to global variables; ( #215 )
...
added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs
2022-02-15 11:31:13 +08:00
HELSON
0f8c7f9804
Fixed docstring in colossalai ( #171 )
2022-01-21 10:44:30 +08:00
Frank Lee
e2089c5c15
adapted for sequence parallel ( #163 )
2022-01-20 13:44:51 +08:00