ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	1 year ago
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	1 year ago
Hongxin Liu	ac178ca5c1	[legacy] move builder and registry to legacy (#4603 )	1 year ago
digger-yu	b7141c36dd	[CI] fix some spelling errors (#3707 ) * fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc.	2 years ago
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2 years ago
yuxuan-lou	198a74b9fd	[NFC] polish colossalai/context/random/__init__.py code style (#3327 )	2 years ago
RichardoLuo	1ce9d0c531	[NFC] polish initializer_data.py code style (#3287 )	2 years ago
Kai Wang (Victor Kai)	964a28678f	[NFC] polish initializer_3d.py code style (#3279 )	2 years ago
Arsmart1	8af977f223	[NFC] polish colossalai/context/parallel_context.py code style (#3276 )	2 years ago
Zirui Zhu	c9e3ee389e	[NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726 )	2 years ago
Ziyue Jiang	4603538ddd	[NFC] posh colossalai/context/process_group_initializer/initializer_sequence.py code style (#2712 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
アマデウス	534f68c83c	[NFC] polish pipeline process group code style (#2694 )	2 years ago
LuGY	56ff1921e9	[NFC] polish colossalai/context/moe_context.py code style (#2693 )	2 years ago
アマデウス	99d9713b02	Revert "Update parallel_context.py (#2408 )" This reverts commit `7d5640b9db`.	2 years ago
Haofan Wang	7d5640b9db	Update parallel_context.py (#2408 )	2 years ago
Tongping Liu	8e22c38b89	[hotfix] Fixing the bug related to ipv6 support Co-authored-by: ByteDance <tongping.liu@bytedance.com>	2 years ago
kurisusnowdeng	0b8161fab8	updated tp layers	2 years ago
HELSON	1468e4bcfc	[zero] add constant placement policy (#1705 ) * fixes memory leak when paramter is in fp16 in ZeroDDP init. * bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release. * adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.	2 years ago
HELSON	95c35f73bd	[moe] initialize MoE groups by ProcessGroup (#1640 )	2 years ago
Frank Lee	27fe8af60c	[autoparallel] refactored shape consistency to remove redundancy (#1591 ) * [autoparallel] refactored shape consistency to remove redundancy * polish code * polish code * polish code	2 years ago
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2 years ago
Frank Lee	2238758c2e	[usability] improved error messages in the context module (#856 )	3 years ago
Frank Lee	920fe31526	[compatibility] used backward-compatible API for global process group (#758 )	3 years ago
Frank Lee	04ff5ea546	[utils] support detection of number of processes on current node (#723 )	3 years ago
Cautiousss	055d0270c8	[NFC] polish colossalai/context/process_group_initializer/initializer_sequence.py colossalai/context/process_group_initializer initializer_tensor.py code style (#639 ) Co-authored-by: 何晓昕 <cautious@r-236-100-25-172.comp.nus.edu.sg>	3 years ago
Jiang Zhuo	0a96338b13	[NFC] polish <colossalai/context/process_group_initializer/initializer_data.py> code stype (#626 ) Co-authored-by: 姜卓 <jiangzhuo@jiangzhuodeMacBook-Pro.local>	3 years ago
ziyu huang	701bad439b	[NFC] polish colossalai/context/process_group_initializer/process_group_initializer.py code stype (#617 ) Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>	3 years ago
アマデウス	297b8baae2	[model checkpoint] add gloo groups for cpu tensor communication (#589 )	3 years ago
Liang Bowen	2c45efc398	html refactor (#555 )	3 years ago
Liang Bowen	ec5086c49c	Refactored docstring to google style	3 years ago
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	3 years ago
HELSON	f24b5ed201	[MOE] remove old MoE legacy (#493 )	3 years ago
Jiarui Fang	65c0f380c2	[format] polish name format for MOE (#481 )	3 years ago
HELSON	7544347145	[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469 )	3 years ago
HELSON	84fd7c1d4d	add moe context, moe utilities and refactor gradient handler (#455 )	3 years ago
Frank Lee	b72b8445c6	optimized context test time consumption (#446 )	3 years ago
Frank Lee	1e4bf85cdb	fixed bug in activation checkpointing test (#387 )	3 years ago
RichardoLuo	8539898ec6	flake8 style change (#363 )	3 years ago
ziyu huang	a77d73f22b	fix format parallel_context.py (#359 ) Co-authored-by: huangziyu <202476410arsmart@gmail.com>	3 years ago
Maruyama_Aya	e83970e3dc	fix format ColossalAI\colossalai\context\process_group_initializer	3 years ago
アマデウス	9ee197d0e9	moved env variables to global variables; (#215 ) added branch context; added vocab parallel layers; moved split_batch from load_batch to tensor parallel embedding layers; updated gpt model; updated unit test cases; fixed few collective communicator bugs	3 years ago
HELSON	0f8c7f9804	Fixed docstring in colossalai (#171 )	3 years ago
Frank Lee	e2089c5c15	adapted for sequence parallel (#163 )	3 years ago
HELSON	dceae85195	Added MoE parallel (#127 )	3 years ago
ver217	a951bc6089	update default logger (#100 ) (#101 )	3 years ago
ver217	96780e6ee4	Optimize pipeline schedule (#94 ) * add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>	3 years ago
アマデウス	01a80cd86d	Hotfix/Colossalai layers (#92 ) * optimized 1d layer apis; reorganized nn.layer modules; fixed tests * fixed 2.5d runtime issue * reworked split batch, now called in trainer.schedule.load_batch Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago
アマデウス	0fedef4f3c	Layer integration (#83 ) * integrated parallel layers for ease of building models * integrated 2.5d layers * cleaned codes and unit tests * added log metric by step hook; updated imagenet benchmark; fixed some bugs * reworked initialization; cleaned codes Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	3 years ago
ver217	8f02a88db2	add interleaved pipeline, fix naive amp and update pipeline model initializer (#80 )	3 years ago
Frank Lee	35813ed3c4	update examples and sphnix docs for the new api (#63 )	3 years ago

1 2

52 Commits (cabc1286ca4a2defffe8e74aaca18023620099f6)