ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	11 months ago
Hongxin Liu	e5ce4c8ea6	[npu] add npu support for gemini and zero (#5067 ) * [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support	1 year ago
littsk	83b52c56cd	[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837 ) * Add clip_grad_norm for hibrid_parallel_plugin * polish code * add unittests * Move tp to a higher-level optimizer interface. * bug fix * polish code	1 year ago
Baizhou Zhang	c0a033700c	[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758 ) * fix master param sync for hybrid plugin * rewrite unwrap for ddp/fsdp * rewrite unwrap for zero/gemini * rewrite unwrap for hybrid plugin * fix geemini unwrap * fix bugs	1 year ago
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	1 year ago
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	1 year ago
Baizhou Zhang	0ceec8f9a9	[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354 ) * add naive optimizer for 3DPlugin/refactor gpt2 shardformer test * merge tests of PP/DP/TP combinations into one test file * fix bug when sync grad for dp in HybridPlugin * update supported precisions for 3DPlugin/fix bug when shifting tp_degree * improve the passing of lazy_init * modify lazy_init/use sync_shared_params	1 year ago
Hongxin Liu	261eab02fb	[plugin] add 3d parallel plugin (#4295 ) * [amp] add mixed precision optimizer * [plugin] add 3d parallel plugin * [booster] support pipeline * [plugin] 3d parallel plugin support clip grad norm * [shardformer] fix sharder and add plugin test * [plugin] rename 3d parallel plugin * [ci] support testmon core pkg change detection (#4305) * [hotfix] debug testmon * [hotfix] fix llama * [hotfix] fix p2p bugs * [hotfix] fix requirements	1 year ago
Hongxin Liu	ae02d4e4f7	[bf16] add bf16 support (#3882 ) * [bf16] add bf16 support for fused adam (#3844) * [bf16] fused adam kernel support bf16 * [test] update fused adam kernel test * [test] update fused adam test * [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860) * [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869) * [bf16] add mixed precision mixin * [bf16] low level zero optim support bf16 * [text] update low level zero test * [text] fix low level zero grad acc test * [bf16] add bf16 support for gemini (#3872) * [bf16] gemini support bf16 * [test] update gemini bf16 test * [doc] update gemini docstring * [bf16] add bf16 support for plugins (#3877) * [bf16] add bf16 support for legacy zero (#3879) * [zero] init context support bf16 * [zero] legacy zero support bf16 * [test] add zero bf16 test * [doc] add bf16 related docstring for legacy zero	2 years ago
HELSON	077a5cdde4	[zero] fix gradient clipping in hybrid parallelism (#2521 ) * [zero] fix gradient clipping in hybrid parallelism * [testing] change model name to avoid pytest warning * [hotfix] fix unit testing	2 years ago
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
Ziheng Qin	3041014089	[NFC] polish colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py code style (#2299 ) Co-authored-by: henryqin1997 <henryqin1997@gamil.com>	2 years ago
HELSON	5d3a2be3af	[amp] add gradient clipping for unit tests (#2283 ) * [amp] add gradient clipping in unit tests * fix bugs	2 years ago
YuliangLiu0306	f027ef7913	[hotfix] fix fp16 optimzier bug (#2273 )	2 years ago
Jiarui Fang	355ffb386e	[builder] unified cpu_optim fused_optim inferface (#2190 )	2 years ago
Jiarui Fang	d42afd30f8	[builder] runtime adam and fused_optim builder (#2184 )	2 years ago
ver217	f8a7148dec	[kernel] move all symlinks of kernel to `colossalai._C` (#1971 )	2 years ago
Junming Wu	14a0b18305	[NFC] polish colossalai/amp/naive_amp/__init__.py code style (#1905 )	2 years ago
zbian	1559a09fb7	[NFC] polish amp.naive_amp.grad_scaler code style	2 years ago
Fazzie-Maqianli	399f84d8f6	[NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819 )	2 years ago
CsRic	9623ec1b02	[NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816 ) * [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) * [NFC] polish colossalai/zero/sharded_param/__init__.py code style * [NFC] polish colossalai/amp/naive_amp/_utils.py code style Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2 years ago
YuliangLiu0306	e27645376d	[hotfix]different overflow status lead to communication stuck. (#1175 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [hotfix]fix some bugs caused by refactored schedule. * [hotfix]different overflow statu llead to communication stuck.	2 years ago
Frank Lee	72bd7c696b	[amp] included dict for type casting of model output (#1102 )	2 years ago
Frank Lee	9fdebadd69	[doc] improved docstring in the amp module (#857 )	3 years ago
HELSON	4c4388c46e	[hotfix] fix memory leak in zero (#781 )	3 years ago
ver217	c5b488edf8	polish amp docstring (#616 )	3 years ago
Liang Bowen	ec5086c49c	Refactored docstring to google style	3 years ago
Jiarui Fang	496cbb0760	[hotfix] fix initialize bug with zero (#442 )	3 years ago
Frank Lee	14a7094243	fixed fp16 optimizer none grad bug (#432 )	3 years ago
Frank Lee	e79ea44247	[fp16] refactored fp16 optimizer (#392 )	3 years ago
Kai Wang (Victor Kai)	53bb3bcc0a	fix format (#362 )	3 years ago
Frank Lee	3d5d64bd10	refactored grad scaler (#338 )	3 years ago
Frank Lee	e17e54e32a	added buffer sync to naive amp model wrapper (#291 )	3 years ago
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	3 years ago
Frank Lee	e2089c5c15	adapted for sequence parallel (#163 )	3 years ago
puck_WCR	9473a1b9c8	AMP docstring/markdown update (#160 )	3 years ago
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	3 years ago
ver217	8f02a88db2	add interleaved pipeline, fix naive amp and update pipeline model initializer (#80 )	3 years ago
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	3 years ago
Frank Lee	da01c234e1	Develop/experiments (#59 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago

40 Commits (5c6c5d6be316a4f4e867d0d8049b508e0d59ad6c)