ColossalAI

Commit Graph

Author	SHA1	Message	Date
jiangmingyan	d449525acf	[doc] update booster tutorials (#3718 ) * [booster] update booster tutorials#3717 * [booster] update booster tutorials#3717, fix * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, fix * [booster] update booster tutorials#3717, fix * [booster] update tutorials#3717, update booster api doc * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3713 * [booster] update tutorials#3713, modify file	2 years ago
Hongxin Liu	5dd573c6b6	[devops] fix ci for document check (#3751 ) * [doc] add test info * [devops] update doc check ci * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] remove debug info and update invalid doc * [devops] add essential comments	2 years ago
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2 years ago
digger-yu	9edeadfb24	[doc] Update 1D_tensor_parallel.md (#3573 ) Display format optimization , same as fix#3562 Simultaneous modification of en version	2 years ago
digger-yu	1c7734bc94	[doc] Update 1D_tensor_parallel.md (#3563 ) Display format optimization, fix bug#3562 Specific changes 1. "This is called a column-parallel fashion" Translate to Chinese 2. use the ```math code block syntax to display a math expression as a block, No modification of formula content Please check that the math formula is displayed correctly If OK, I will change the format of the English version of the formula in parallel	2 years ago
digger-yu	a3ac48ef3d	[doc] Update README-zh-Hans.md (#3541 ) Fixing document link errors using absolute paths	2 years ago
binmakeswell	0c0455700f	[doc] add requirement and highlight application (#3516 ) * [doc] add requirement and highlight application * [doc] link example and application	2 years ago
Frank Lee	4e9989344d	[doc] updated contributor list (#3474 )	2 years ago
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
ver217	26b7aac0be	[zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import	2 years ago
binmakeswell	15a74da79c	[doc] add Intel cooperation news (#3333 ) * [doc] add Intel cooperation news * [doc] add Intel cooperation news	2 years ago
binmakeswell	31c78f2be3	[doc] add ColossalChat news (#3304 ) * [doc] add ColossalChat news * [doc] add ColossalChat news	2 years ago
binmakeswell	682af61396	[doc] add ColossalChat (#3297 ) * [doc] add ColossalChat	2 years ago
Saurav Maheshkar	20d1c99444	[refactor] update docs (#3174 ) * refactor: README-zh-Hans * refactor: REFERENCE * docs: update paths in README	2 years ago
Frank Lee	3213347b49	[doc] fixed typos in docs/README.md (#3082 )	2 years ago
Frank Lee	416a50dbd7	[doc] moved doc test command to bottom (#3075 )	2 years ago
Frank Lee	ea0b52c12e	[doc] specified operating system requirement (#3019 ) * [doc] specified operating system requirement * polish code	2 years ago
ver217	378d827c6b	[doc] update nvme offload doc (#3014 ) * [doc] update nvme offload doc * [doc] add doc testing cmd and requirements * [doc] add api reference * [doc] add dependencies	2 years ago
Frank Lee	8fedc8766a	[workflow] supported conda package installation in doc test (#3028 ) * [workflow] supported conda package installation in doc test * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
Frank Lee	e0a1c1321c	[doc] added reference to related works (#2994 ) * [doc] added reference to related works * polish code	2 years ago
github-actions[bot]	dca98937f8	[format] applied code formatting on changed files in pull request 2933 (#2939 ) Co-authored-by: github-actions <github-actions@github.com>	2 years ago
binmakeswell	8264cd7ef1	[doc] add env scope (#2933 )	2 years ago
Frank Lee	b8804aa60c	[doc] added readme for documentation (#2935 )	2 years ago
Frank Lee	9e3b8b7aff	[doc] removed read-the-docs (#2932 )	2 years ago
Frank Lee	77b88a3849	[workflow] added auto doc test on PR (#2929 ) * [workflow] added auto doc test on PR * [workflow] added doc test workflow * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
binmakeswell	0afb55fc5b	[doc] add os scope, update tutorial install and tips (#2914 )	2 years ago
YuliangLiu0306	cf6409dd40	Hotfix/auto parallel zh doc (#2820 ) * [hotfix] fix autoparallel zh docs * polish * polish	2 years ago
YuliangLiu0306	2059fdd6b0	[hotfix] add copyright for solver and device mesh (#2803 ) * [hotfix] add copyright for solver and device mesh * add readme * add alpa license * polish	2 years ago
Frank Lee	e376954305	[doc] add opt service doc (#2747 )	2 years ago
Frank Lee	5479fdd5b8	[doc] updated documentation version list (#2730 )	2 years ago
Frank Lee	2045d45ab7	[doc] updated documentation version list (#2715 )	2 years ago
Frank Lee	0966008839	[dooc] fixed the sidebar itemm key (#2672 )	2 years ago
Frank Lee	6d60634433	[doc] added documentation sidebar translation (#2670 )	2 years ago
Frank Lee	81ea66d25d	[release] v0.2.3 (#2669 ) * [release] v0.2.3 * polish code	2 years ago
YuliangLiu0306	8de85051b3	[Docs] layout converting management (#2665 )	2 years ago
Frank Lee	b673e5f78b	[release] v0.2.2 (#2661 )	2 years ago
Frank Lee	cd4f02bed8	[doc] fixed compatiblity with docusaurus (#2657 )	2 years ago
Frank Lee	a4ae43f071	[doc] added docusaurus-based version control (#2656 )	2 years ago
Frank Lee	85b2303b55	[doc] migrate the markdown files (#2652 )	2 years ago
Frank Lee	d3480396f8	[doc] updated the sphinx theme (#2635 )	2 years ago
binmakeswell	a01278e810	Update requirements.txt	2 years ago
Jiarui Fang	cc0ed7cf33	[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972 )	2 years ago
Ziyue Jiang	63f250bbd4	fix file name (#1759 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2 years ago
Jiarui Fang	4165eabb1e	[hotfix] remove potiential circle import (#1307 ) * make it faster * [hotfix] remove circle import	2 years ago
Jiarui Fang	4d9332b4c5	[refactor] moving memtracer to gemini (#801 )	3 years ago
ver217	f69507dd22	update rst (#615 )	3 years ago
Liang Bowen	2c45efc398	html refactor (#555 )	3 years ago
LuGY	c44d797072	[docs] updatad docs of hybrid adam and cpu adam (#552 )	3 years ago
ver217	ffca99d187	[doc] update apidoc (#530 )	3 years ago
ver217	9caa8b6481	docs get correct release version (#489 )	3 years ago
ver217	7e30068a22	[doc] update rst (#470 ) * update rst * remove empty rst	3 years ago
binmakeswell	ce7b2c9ae3	update README and images path (#384 )	3 years ago
binmakeswell	08eccfe681	add community group and update issue template(#271 )	3 years ago
Sze-qq	3312d716a0	update experimental visualization (#253 )	3 years ago
binmakeswell	753035edd3	add Chinese README	3 years ago
WANG-CR	6fb550acdb	update logo	3 years ago
ver217	1949d3a889	update doc requirements and rtd conf (#165 )	3 years ago
Frank Lee	be85a0f366	removed tutorial markdown and refreshed rst files for consistency	3 years ago
binmakeswell	17ce8569a8	add logo at homepage, add forum in issue template (#161 )	3 years ago
puck_WCR	9473a1b9c8	AMP docstring/markdown update (#160 )	3 years ago
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	3 years ago
ver217	8f02a88db2	add interleaved pipeline, fix naive amp and update pipeline model initializer (#80 )	3 years ago
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	3 years ago
ver217	7d3711058f	fix zero3 fp16 and add zero3 model context (#62 )	3 years ago
Frank Lee	9a0466534c	update markdown docs (english) (#60 )	3 years ago
Frank Lee	da01c234e1	Develop/experiments (#59 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago
Frank Lee	3defa32aee	Support TP-compatible Torch AMP and Update trainer API (#27 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com>	3 years ago
ver217	2b05de4c64	use env to control the language of doc (#24 ) (#25 )	3 years ago
binmakeswell	05e7069a5b	fixed some typos in the documents, added blog link and paper author information in README	3 years ago
Fan Cui	18ba66e012	added Chinese documents and fixed some typos in English documents	3 years ago
ver217	50982c0b7d	reoder parallelization methods in parallelization documentation	3 years ago
ver217	3c7604ba30	update documentation	3 years ago
zbian	404ecbdcc6	Migrated project	3 years ago

1 2 3 4

174 Commits (feat/online-serving)