ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	11 months ago
Hongxin Liu	e5ce4c8ea6	[npu] add npu support for gemini and zero (#5067 ) * [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support	1 year ago
littsk	83b52c56cd	[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837 ) * Add clip_grad_norm for hibrid_parallel_plugin * polish code * add unittests * Move tp to a higher-level optimizer interface. * bug fix * polish code	1 year ago
Baizhou Zhang	c0a033700c	[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758 ) * fix master param sync for hybrid plugin * rewrite unwrap for ddp/fsdp * rewrite unwrap for zero/gemini * rewrite unwrap for hybrid plugin * fix geemini unwrap * fix bugs	1 year ago
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	1 year ago
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	1 year ago
Baizhou Zhang	0ceec8f9a9	[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354 ) * add naive optimizer for 3DPlugin/refactor gpt2 shardformer test * merge tests of PP/DP/TP combinations into one test file * fix bug when sync grad for dp in HybridPlugin * update supported precisions for 3DPlugin/fix bug when shifting tp_degree * improve the passing of lazy_init * modify lazy_init/use sync_shared_params	1 year ago
Hongxin Liu	261eab02fb	[plugin] add 3d parallel plugin (#4295 ) * [amp] add mixed precision optimizer * [plugin] add 3d parallel plugin * [booster] support pipeline * [plugin] 3d parallel plugin support clip grad norm * [shardformer] fix sharder and add plugin test * [plugin] rename 3d parallel plugin * [ci] support testmon core pkg change detection (#4305) * [hotfix] debug testmon * [hotfix] fix llama * [hotfix] fix p2p bugs * [hotfix] fix requirements	1 year ago
Hongxin Liu	ae02d4e4f7	[bf16] add bf16 support (#3882 ) * [bf16] add bf16 support for fused adam (#3844) * [bf16] fused adam kernel support bf16 * [test] update fused adam kernel test * [test] update fused adam test * [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860) * [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869) * [bf16] add mixed precision mixin * [bf16] low level zero optim support bf16 * [text] update low level zero test * [text] fix low level zero grad acc test * [bf16] add bf16 support for gemini (#3872) * [bf16] gemini support bf16 * [test] update gemini bf16 test * [doc] update gemini docstring * [bf16] add bf16 support for plugins (#3877) * [bf16] add bf16 support for legacy zero (#3879) * [zero] init context support bf16 * [zero] legacy zero support bf16 * [test] add zero bf16 test * [doc] add bf16 related docstring for legacy zero	1 year ago
digger yu	32f81f14d4	[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756 )	2 years ago
lucasliunju	4b95464994	[NFC] polish colossalai/amp/__init__.py code style (#3272 )	2 years ago
Frank Lee	8518263b80	[test] fixed the triton version for testing (#2608 )	2 years ago
HELSON	077a5cdde4	[zero] fix gradient clipping in hybrid parallelism (#2521 ) * [zero] fix gradient clipping in hybrid parallelism * [testing] change model name to avoid pytest warning * [hotfix] fix unit testing	2 years ago
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
xyupeng	b965585d05	[NFC] polish colossalai/amp/torch_amp/torch_amp.py code style (#2290 )	2 years ago
Ziheng Qin	3041014089	[NFC] polish colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py code style (#2299 ) Co-authored-by: henryqin1997 <henryqin1997@gamil.com>	2 years ago
HELSON	5d3a2be3af	[amp] add gradient clipping for unit tests (#2283 ) * [amp] add gradient clipping in unit tests * fix bugs	2 years ago
YuliangLiu0306	f027ef7913	[hotfix] fix fp16 optimzier bug (#2273 )	2 years ago
Jiarui Fang	355ffb386e	[builder] unified cpu_optim fused_optim inferface (#2190 )	2 years ago
Jiarui Fang	d42afd30f8	[builder] runtime adam and fused_optim builder (#2184 )	2 years ago
ver217	f8a7148dec	[kernel] move all symlinks of kernel to `colossalai._C` (#1971 )	2 years ago
Junming Wu	14a0b18305	[NFC] polish colossalai/amp/naive_amp/__init__.py code style (#1905 )	2 years ago
LuGY	94329fc139	[NFC] polish colossalai/amp/apex_amp/__init__.py code style (#1853 )	2 years ago
zbian	1559a09fb7	[NFC] polish amp.naive_amp.grad_scaler code style	2 years ago
Genghan Zhang	b25030cc07	[NFC] polish ./colossalai/amp/torch_amp/__init__.py code style (#1836 )	2 years ago
Ziyue Jiang	5da03c936d	[NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Fazzie-Maqianli	399f84d8f6	[NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819 )	2 years ago
CsRic	9623ec1b02	[NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816 ) * [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) * [NFC] polish colossalai/zero/sharded_param/__init__.py code style * [NFC] polish colossalai/amp/naive_amp/_utils.py code style Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2 years ago
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2 years ago
YuliangLiu0306	e27645376d	[hotfix]different overflow status lead to communication stuck. (#1175 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [hotfix]fix some bugs caused by refactored schedule. * [hotfix]different overflow statu llead to communication stuck.	2 years ago
Frank Lee	72bd7c696b	[amp] included dict for type casting of model output (#1102 )	2 years ago
Frank Lee	9fdebadd69	[doc] improved docstring in the amp module (#857 )	3 years ago
HELSON	4c4388c46e	[hotfix] fix memory leak in zero (#781 )	3 years ago
Frank Lee	a4e91bc87f	[bug] fixed grad scaler compatibility with torch 1.8 (#735 )	3 years ago
Jiarui Fang	4d90a7b513	[refactor] zero directory (#724 )	3 years ago
Kai Wang (Victor Kai)	b0f708dfc1	fix format (#570 )	3 years ago
ver217	c5b488edf8	polish amp docstring (#616 )	3 years ago
Liang Bowen	2c45efc398	html refactor (#555 )	3 years ago
Liang Bowen	ec5086c49c	Refactored docstring to google style	3 years ago
Jiarui Fang	496cbb0760	[hotfix] fix initialize bug with zero (#442 )	3 years ago
Frank Lee	14a7094243	fixed fp16 optimizer none grad bug (#432 )	3 years ago
Frank Lee	e79ea44247	[fp16] refactored fp16 optimizer (#392 )	3 years ago
Kai Wang (Victor Kai)	53bb3bcc0a	fix format (#362 )	3 years ago
Frank Lee	3d5d64bd10	refactored grad scaler (#338 )	3 years ago
Frank Lee	6a3188167c	set criterion as optional in colossalai initialize (#336 )	3 years ago
Frank Lee	e17e54e32a	added buffer sync to naive amp model wrapper (#291 )	3 years ago
Frank Lee	f5ca88ec97	fixed apex import (#227 )	3 years ago
アマデウス	9ee197d0e9	moved env variables to global variables; (#215 ) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs	3 years ago
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	3 years ago
Frank Lee	e2089c5c15	adapted for sequence parallel (#163 )	3 years ago

1 2

56 Commits (8241c0c054b38a109ed3ce7be1052a1e600b8471)