ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Edenzzzz	d83c633ca6	[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606 ) * fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu>	7 months ago
Stephan Kölker	5d380a1a21	[hotfix] Fix wrong import in meta_registry (#5392 )	9 months ago
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	11 months ago
Hongxin Liu	e5ce4c8ea6	[npu] add npu support for gemini and zero (#5067 ) * [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support	1 year ago
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	1 year ago
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	1 year ago
Hongxin Liu	554aa9592e	[legacy] move communication and nn to legacy and refactor logger (#4671 ) * [legacy] move communication to legacy (#4640) * [legacy] refactor logger and clean up legacy codes (#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check	1 year ago
Hongxin Liu	ac178ca5c1	[legacy] move builder and registry to legacy (#4603 )	1 year ago
Lufang Chen	12c95a9fed	fix runtime prepare pass (#4502 ) Co-authored-by: lufang.chen <lufang.chen@nio.com>	1 year ago
Wenhao Chen	fee553288b	[NFC] polish runtime_preparation_pass style (#4266 )	1 year ago
YeAnbang	3883db452c	[NFC] polish unary_elementwise_generator.py code style (#4267 ) Co-authored-by: aye42 <aye42@gatech.edu>	1 year ago
Yanjia0	c614a99d28	[NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255 )	1 year ago
Frank Lee	c4b1b65931	[test] fixed tests failed due to dtensor change (#4082 ) * [test] fixed tests failed due to dtensor change * polish code	1 year ago
digger yu	e2d81eba0d	[nfc] fix typo colossalai/ applications/ (#3831 ) * fix typo colossalai/autochunk auto_parallel amp * fix typo colossalai/auto_parallel nn utils etc. * fix typo colossalai/auto_parallel autochunk fx/passes etc. * fix typo docs/ * change placememt_policy to placement_policy in docs/ and examples/ * fix typo colossalai/ applications/	2 years ago
digger yu	7f8203af69	fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808 )	2 years ago
digger yu	9265f2d4d7	[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779 ) * fix typo colossalai/autochunk auto_parallel amp * fix typo colossalai/auto_parallel nn utils etc.	2 years ago
digger yu	32f81f14d4	[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756 )	2 years ago
digger yu	1baeb39c72	[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742 ) * fix typo applications/ and colossalai/ date 5.11 * fix typo colossalai/	2 years ago
digger-yu	ad6460cf2c	[NFC] fix typo applications/ and colossalai/ (#3735 )	2 years ago
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2 years ago
YuliangLiu0306	ffcdbf0f65	[autoparallel]integrate auto parallel feature with new tracer (#3408 ) * [autoparallel] integrate new analyzer in module level * unify the profiling method * polish * fix no codegen bug * fix pass bug * fix liveness test * polish	2 years ago
ver217	26b7aac0be	[zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import	2 years ago
Frank Lee	638a07a7f9	[test] fixed gemini plugin test (#3411 ) * [test] fixed gemini plugin test * polish code * polish code	2 years ago
YuliangLiu0306	fee2af8610	[autoparallel] adapt autoparallel with new analyzer (#3261 ) * [autoparallel] adapt autoparallel with new analyzer * fix all node handler tests * polish * polish	2 years ago
Zihao	18dbe76cae	[auto-parallel] add auto-offload feature (#3154 ) * add auto-offload feature * polish code * fix syn offload runtime pass bug * add offload example * fix offload testing bug * fix example testing bug	2 years ago
YuliangLiu0306	47fb214b3b	[hotfix] add shard dim to aviod backward communication error (#2954 )	2 years ago
YuliangLiu0306	197d0bf4ed	[autoparallel] apply repeat block to reduce solving time (#2912 )	2 years ago
YuliangLiu0306	819e25d8b1	[hotfix] fix autoparallel compatibility test issues (#2754 )	2 years ago
YuliangLiu0306	0f392d7403	[autoparallel] find repeat blocks (#2854 ) * [autoparallel] find repeat blocks * polish * polish * polish	2 years ago
Boyuan Yao	eae77c831d	[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823 ) * [autoparallel] non spmd meta information generator * [autoparallel] patch meta information for non spmd nodes	2 years ago
Boyuan Yao	c7764d3f22	[autoparallel] Patch meta information of `torch.where` (#2822 ) * [autoparallel] patch meta information of torch.where * [autoparallel] pre-commit modified	2 years ago
Boyuan Yao	fcc4097efa	[autoparallel] Patch meta information of `torch.tanh()` and `torch.nn.Dropout` (#2773 ) * [autoparallel] tanh meta information * [autoparallel] remove redundant code * [autoparallel] patch meta information of torch.nn.Dropout	2 years ago
Boyuan Yao	7ea6bc7f69	[autoparallel] Patch tensor related operations meta information (#2789 ) * [autoparallel] tensor related meta information prototype * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information * [autoparallel] tensor related meta information	2 years ago
YuliangLiu0306	2059fdd6b0	[hotfix] add copyright for solver and device mesh (#2803 ) * [hotfix] add copyright for solver and device mesh * add readme * add alpa license * polish	2 years ago
Boyuan Yao	8593ae1a3f	[autoparallel] rotor solver refactor (#2813 ) * [autoparallel] rotor solver refactor * [autoparallel] rotor solver refactor	2 years ago
Boyuan Yao	a2b43e393d	[autoparallel] Patch meta information of `torch.nn.Embedding` (#2760 ) * [autoparallel] embedding metainfo * [autoparallel] fix function name in test_activation_metainfo * [autoparallel] undo changes in activation metainfo and related tests	2 years ago
YuliangLiu0306	1dc003c169	[autoparallel] distinguish different parallel strategies (#2699 )	2 years ago
YuliangLiu0306	21d6a48f4d	[autoparallel] add shard option (#2696 ) * [autoparallel] add shard option * polish	2 years ago
YuliangLiu0306	5b24987fa7	[autoparallel] fix parameters sharding bug (#2716 )	2 years ago
YuliangLiu0306	cb2c6a2415	[autoparallel] refactor runtime pass (#2644 ) * [autoparallel] refactor runtime pass * add unit test * polish	2 years ago
YuliangLiu0306	0b2a738393	[autoparallel] remove deprecated codes (#2664 )	2 years ago
YuliangLiu0306	7fa6be49d2	[autoparallel] test compatibility for gemini and auto parallel (#2700 )	2 years ago
Boyuan Yao	40c916b192	[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` (#2674 ) * [autoparallel] softmax metainfo * [autoparallel] softmax metainfo	2 years ago
Boyuan Yao	0385b26ebf	[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647 ) * [autoparallel] layernorm metainfo patch * [autoparallel] polish test	2 years ago
YuliangLiu0306	37df666f38	[autoparallel] refactor handlers which reshape input tensors (#2615 ) * [autoparallel] refactor handlers which reshape input tensors * polish	2 years ago
YuliangLiu0306	28398f1c70	add overlap option (#2613 )	2 years ago
YuliangLiu0306	cb3d1bef62	[autoparallel] adapt autoparallel tests with latest api (#2626 )	2 years ago
Boyuan Yao	90a9fdd91d	[autoparallel] Patch meta information of `torch.matmul` (#2584 ) * [autoparallel] matmul metainfo * [auto_parallel] remove unused print * [tests] skip test_matmul_handler when torch version is lower than 1.12.0	2 years ago
YuliangLiu0306	aa0f6686f9	[autoparallel] accelerate gpt2 training (#2495 )	2 years ago

1 2 3 4 5

215 Commits (5c6c5d6be316a4f4e867d0d8049b508e0d59ad6c)