ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	21e29e2212	[doc] add tutorial for booster plugins (#3758 ) * [doc] add en booster plugins doc * [doc] add booster plugins doc in sidebar * [doc] add zh booster plugins doc * [doc] fix zh booster plugin translation * [doc] reoganize tutorials order of basic section * [devops] force sync to test ci	2023-05-19 12:12:42 +08:00
Hongxin Liu	5ce6c9d86f	[doc] add tutorial for cluster utils (#3763 ) * [doc] add en cluster utils doc * [doc] add zh cluster utils doc * [doc] add cluster utils doc in sidebar	2023-05-19 12:12:20 +08:00
jiangmingyan	48bd056761	[doc] update hybrid parallelism doc (#3770 )	2023-05-18 14:16:13 +08:00
jiangmingyan	d449525acf	[doc] update booster tutorials (#3718 ) * [booster] update booster tutorials#3717 * [booster] update booster tutorials#3717, fix * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, fix * [booster] update booster tutorials#3717, fix * [booster] update tutorials#3717, update booster api doc * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3713 * [booster] update tutorials#3713, modify file	2023-05-18 11:41:56 +08:00
Hongxin Liu	5dd573c6b6	[devops] fix ci for document check (#3751 ) * [doc] add test info * [devops] update doc check ci * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] remove debug info and update invalid doc * [devops] add essential comments	2023-05-17 11:24:22 +08:00
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2023-04-26 11:38:43 +08:00
digger-yu	9edeadfb24	[doc] Update 1D_tensor_parallel.md (#3573 ) Display format optimization , same as fix#3562 Simultaneous modification of en version	2023-04-17 12:19:53 +08:00
digger-yu	1c7734bc94	[doc] Update 1D_tensor_parallel.md (#3563 ) Display format optimization, fix bug#3562 Specific changes 1. "This is called a column-parallel fashion" Translate to Chinese 2. use the ```math code block syntax to display a math expression as a block, No modification of formula content Please check that the math formula is displayed correctly If OK, I will change the format of the English version of the formula in parallel	2023-04-14 22:12:32 +08:00
digger-yu	a3ac48ef3d	[doc] Update README-zh-Hans.md (#3541 ) Fixing document link errors using absolute paths	2023-04-12 23:09:30 +08:00
binmakeswell	0c0455700f	[doc] add requirement and highlight application (#3516 ) * [doc] add requirement and highlight application * [doc] link example and application	2023-04-10 17:37:16 +08:00
Frank Lee	4e9989344d	[doc] updated contributor list (#3474 )	2023-04-06 17:47:59 +08:00
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2023-04-06 14:51:35 +08:00
ver217	26b7aac0be	[zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import	2023-04-04 13:48:16 +08:00
binmakeswell	15a74da79c	[doc] add Intel cooperation news (#3333 ) * [doc] add Intel cooperation news * [doc] add Intel cooperation news	2023-03-30 11:45:01 +08:00
binmakeswell	31c78f2be3	[doc] add ColossalChat news (#3304 ) * [doc] add ColossalChat news * [doc] add ColossalChat news	2023-03-29 09:27:55 +08:00
binmakeswell	682af61396	[doc] add ColossalChat (#3297 ) * [doc] add ColossalChat	2023-03-29 02:35:10 +08:00
Saurav Maheshkar	20d1c99444	[refactor] update docs (#3174 ) * refactor: README-zh-Hans * refactor: REFERENCE * docs: update paths in README	2023-03-20 10:52:01 +08:00
Frank Lee	3213347b49	[doc] fixed typos in docs/README.md (#3082 )	2023-03-10 10:32:14 +08:00
Frank Lee	416a50dbd7	[doc] moved doc test command to bottom (#3075 )	2023-03-09 18:10:45 +08:00
Frank Lee	ea0b52c12e	[doc] specified operating system requirement (#3019 ) * [doc] specified operating system requirement * polish code	2023-03-07 18:04:10 +08:00
ver217	378d827c6b	[doc] update nvme offload doc (#3014 ) * [doc] update nvme offload doc * [doc] add doc testing cmd and requirements * [doc] add api reference * [doc] add dependencies	2023-03-07 17:49:01 +08:00
Frank Lee	8fedc8766a	[workflow] supported conda package installation in doc test (#3028 ) * [workflow] supported conda package installation in doc test * polish code * polish code * polish code * polish code * polish code * polish code	2023-03-07 14:21:26 +08:00
Frank Lee	e0a1c1321c	[doc] added reference to related works (#2994 ) * [doc] added reference to related works * polish code	2023-03-04 17:32:22 +08:00
github-actions[bot]	dca98937f8	[format] applied code formatting on changed files in pull request 2933 (#2939 ) Co-authored-by: github-actions <github-actions@github.com>	2023-02-28 15:41:52 +08:00
binmakeswell	8264cd7ef1	[doc] add env scope (#2933 )	2023-02-28 15:39:51 +08:00
Frank Lee	b8804aa60c	[doc] added readme for documentation (#2935 )	2023-02-28 14:04:52 +08:00
Frank Lee	9e3b8b7aff	[doc] removed read-the-docs (#2932 )	2023-02-28 11:28:24 +08:00
Frank Lee	77b88a3849	[workflow] added auto doc test on PR (#2929 ) * [workflow] added auto doc test on PR * [workflow] added doc test workflow * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2023-02-28 11:10:38 +08:00
binmakeswell	0afb55fc5b	[doc] add os scope, update tutorial install and tips (#2914 )	2023-02-27 14:59:27 +08:00
YuliangLiu0306	cf6409dd40	Hotfix/auto parallel zh doc (#2820 ) * [hotfix] fix autoparallel zh docs * polish * polish	2023-02-19 15:57:14 +08:00
YuliangLiu0306	2059fdd6b0	[hotfix] add copyright for solver and device mesh (#2803 ) * [hotfix] add copyright for solver and device mesh * add readme * add alpa license * polish	2023-02-18 21:14:38 +08:00
Frank Lee	e376954305	[doc] add opt service doc (#2747 )	2023-02-16 15:45:26 +08:00
Frank Lee	5479fdd5b8	[doc] updated documentation version list (#2730 )	2023-02-15 17:39:50 +08:00
Frank Lee	2045d45ab7	[doc] updated documentation version list (#2715 )	2023-02-15 11:24:18 +08:00
Frank Lee	0966008839	[dooc] fixed the sidebar itemm key (#2672 )	2023-02-13 10:45:16 +08:00
Frank Lee	6d60634433	[doc] added documentation sidebar translation (#2670 )	2023-02-13 10:10:12 +08:00
Frank Lee	81ea66d25d	[release] v0.2.3 (#2669 ) * [release] v0.2.3 * polish code	2023-02-13 09:51:25 +08:00
YuliangLiu0306	8de85051b3	[Docs] layout converting management (#2665 )	2023-02-10 18:38:32 +08:00
Frank Lee	b673e5f78b	[release] v0.2.2 (#2661 )	2023-02-10 11:01:24 +08:00
Frank Lee	cd4f02bed8	[doc] fixed compatiblity with docusaurus (#2657 )	2023-02-09 17:06:29 +08:00
Frank Lee	a4ae43f071	[doc] added docusaurus-based version control (#2656 )	2023-02-09 16:38:49 +08:00
Frank Lee	85b2303b55	[doc] migrate the markdown files (#2652 )	2023-02-09 14:21:38 +08:00
Frank Lee	d3480396f8	[doc] updated the sphinx theme (#2635 )	2023-02-08 13:48:08 +08:00
binmakeswell	a01278e810	Update requirements.txt	2022-11-18 18:57:18 +08:00
Jiarui Fang	cc0ed7cf33	[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972 )	2022-11-17 14:43:49 +08:00
Ziyue Jiang	63f250bbd4	fix file name (#1759 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2022-10-25 16:48:48 +08:00
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2022-07-21 15:54:53 +08:00
Jiarui Fang	4165eabb1e	[hotfix] remove potiential circle import (#1307 ) * make it faster * [hotfix] remove circle import	2022-07-14 13:44:26 +08:00
Jiarui Fang	4d9332b4c5	[refactor] moving memtracer to gemini (#801 )	2022-04-19 10:13:08 +08:00
ver217	f69507dd22	update rst (#615 )	2022-04-01 15:46:38 +08:00
Liang Bowen	2c45efc398	html refactor (#555 )	2022-03-31 11:36:56 +08:00
LuGY	c44d797072	[docs] updatad docs of hybrid adam and cpu adam (#552 )	2022-03-30 18:14:59 +08:00
ver217	ffca99d187	[doc] update apidoc (#530 )	2022-03-25 18:29:43 +08:00
ver217	9caa8b6481	docs get correct release version (#489 )	2022-03-22 14:24:41 +08:00
ver217	7e30068a22	[doc] update rst (#470 ) * update rst * remove empty rst	2022-03-21 10:52:45 +08:00
binmakeswell	ce7b2c9ae3	update README and images path (#384 )	2022-03-11 15:50:28 +08:00
binmakeswell	08eccfe681	add community group and update issue template(#271 )	2022-03-11 15:50:28 +08:00
Sze-qq	3312d716a0	update experimental visualization (#253 )	2022-03-11 15:50:28 +08:00
binmakeswell	753035edd3	add Chinese README	2022-03-11 15:50:28 +08:00
WANG-CR	6fb550acdb	update logo	2022-01-21 12:31:07 +08:00
ver217	1949d3a889	update doc requirements and rtd conf (#165 )	2022-01-19 19:46:43 +08:00
Frank Lee	be85a0f366	removed tutorial markdown and refreshed rst files for consistency	2022-01-19 17:01:37 +08:00
binmakeswell	17ce8569a8	add logo at homepage, add forum in issue template (#161 )	2022-01-19 14:29:31 +08:00
puck_WCR	9473a1b9c8	AMP docstring/markdown update (#160 )	2022-01-18 18:33:36 +08:00
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	2021-12-30 15:56:46 +08:00
ver217	8f02a88db2	add interleaved pipeline, fix naive amp and update pipeline model initializer (#80 )	2021-12-20 23:26:19 +08:00
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	2021-12-13 22:07:01 +08:00
ver217	7d3711058f	fix zero3 fp16 and add zero3 model context (#62 )	2021-12-10 17:48:50 +08:00
Frank Lee	9a0466534c	update markdown docs (english) (#60 )	2021-12-10 14:37:33 +08:00
Frank Lee	da01c234e1	Develop/experiments (#59 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp Fix convergence in cifar10, Imagenet1000 * Integrate 1d tensor parallel in Colossal-AI (#39) * fixed 1D and 2D convergence (#38) * optimized 2D operations * fixed 1D ViT convergence problem * Feature/ddp (#49) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * support torch ddp * fix loss accumulation * add log for ddp * change seed * modify timing hook Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * Feature/pipeline (#40) * remove redundancy func in setup (#19) (#20) * use env to control the language of doc (#24) (#25) * Support TP-compatible Torch AMP and Update trainer API (#27) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) * add explanation for ViT example (#35) (#36) * optimize communication of pipeline parallel * fix grad clip for pipeline Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51) * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset * update api for better usability (#58) update api for better usability Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com> Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	2021-12-09 15:08:29 +08:00
Frank Lee	3defa32aee	Support TP-compatible Torch AMP and Update trainer API (#27 ) * Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit `2e0b0b7699`. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com>	2021-11-18 19:45:06 +08:00
ver217	2b05de4c64	use env to control the language of doc (#24 ) (#25 )	2021-11-15 16:53:56 +08:00
binmakeswell	05e7069a5b	fixed some typos in the documents, added blog link and paper author information in README	2021-11-03 17:18:43 +08:00
Fan Cui	18ba66e012	added Chinese documents and fixed some typos in English documents	2021-11-02 23:28:44 +08:00
ver217	50982c0b7d	reoder parallelization methods in parallelization documentation	2021-11-01 14:31:55 +08:00
ver217	3c7604ba30	update documentation	2021-10-29 09:29:20 +08:00
zbian	404ecbdcc6	Migrated project	2021-10-28 18:21:23 +02:00

1 2 3

127 Commits (4c4482f3adb56943a150b8b7ed886e2218fc98d5)