ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jun Gao	dce05da535	fix thrust-transform-reduce error (#5078 )	2023-11-21 15:09:35 +08:00
Hongxin Liu	e5ce4c8ea6	[npu] add npu support for gemini and zero (#5067 ) * [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support	2023-11-20 16:12:41 +08:00
Hongxin Liu	4f68b3f10c	[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921 ) * [kernel] support pure fp16 for cpu adam (#4896) * [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919) * [kernel] fix cpu adam * [test] update gemini optim test	2023-10-16 21:56:53 +08:00
Xu Kai	611a5a80ca	[inference] Add smmoothquant for llama (#4904 ) * [inference] add int8 rotary embedding kernel for smoothquant (#4843) * [inference] add smoothquant llama attention (#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (#4902) * rafactor code * add license * add torch-int and smoothquant license	2023-10-16 11:28:44 +08:00
Camille Zhong	cd6a962e66	[NFC] polish code style (#4799 )	2023-10-07 13:36:52 +08:00
littsk	eef96e0877	polish code for gptq (#4793 )	2023-10-07 13:36:52 +08:00
Xu Kai	946ab56c48	[feature] add gptq for inference (#4754 ) * [gptq] add gptq kernel (#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import	2023-09-22 11:02:50 +08:00
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	2023-09-19 14:20:26 +08:00
Hongxin Liu	0b00def881	[example] add llama2 example (#4527 ) * [example] transfer llama-1 example * [example] fit llama-2 * [example] refactor scripts folder * [example] fit new gemini plugin * [cli] fix multinode runner * [example] fit gemini optim checkpoint * [example] refactor scripts * [example] update requirements * [example] update requirements * [example] rename llama to llama2 * [example] update readme and pretrain script * [example] refactor scripts	2023-08-28 17:59:11 +08:00
flybird1111	7a3dfd0c64	[shardformer] update shardformer to use flash attention 2 (#4392 ) * cherry-pick flash attention 2 cherry-pick flash attention 2 * [shardformer] update shardformer to use flash attention 2 [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix	2023-08-15 23:25:14 +08:00
flybird1111	38b792aab2	[coloattention] fix import error (#4380 ) fixed an import error	2023-08-04 16:28:41 +08:00
flybird1111	25c57b9fb4	[fix] coloattention support flash attention 2 (#4347 ) Improved ColoAttention interface to support flash attention 2. Solved #4322	2023-08-04 13:46:22 +08:00
Hongxin Liu	ae02d4e4f7	[bf16] add bf16 support (#3882 ) * [bf16] add bf16 support for fused adam (#3844) * [bf16] fused adam kernel support bf16 * [test] update fused adam kernel test * [test] update fused adam test * [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860) * [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869) * [bf16] add mixed precision mixin * [bf16] low level zero optim support bf16 * [text] update low level zero test * [text] fix low level zero grad acc test * [bf16] add bf16 support for gemini (#3872) * [bf16] gemini support bf16 * [test] update gemini bf16 test * [doc] update gemini docstring * [bf16] add bf16 support for plugins (#3877) * [bf16] add bf16 support for legacy zero (#3879) * [zero] init context support bf16 * [zero] legacy zero support bf16 * [test] add zero bf16 test * [doc] add bf16 related docstring for legacy zero	2023-06-05 15:58:31 +08:00
digger yu	70c8cdecf4	[nfc] fix typo colossalai/cli fx kernel (#3847 ) * fix typo colossalai/autochunk auto_parallel amp * fix typo colossalai/auto_parallel nn utils etc. * fix typo colossalai/auto_parallel autochunk fx/passes etc. * fix typo docs/ * change placememt_policy to placement_policy in docs/ and examples/ * fix typo colossalai/ applications/ * fix typo colossalai/cli fx kernel	2023-06-02 15:02:45 +08:00
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2023-04-26 11:38:43 +08:00
zbian	7bc0afc901	updated flash attention usage	2023-03-20 17:57:04 +08:00
Frank Lee	95a36eae63	[kernel] added kernel loader to softmax autograd function (#3093 ) * [kernel] added kernel loader to softmax autograd function * [release] v0.2.6	2023-03-10 14:27:09 +08:00
ver217	823f3b9cf4	[doc] add deepspeed citation and copyright (#2996 ) * [doc] add deepspeed citation and copyright * [doc] add deepspeed citation and copyright * [doc] add deepspeed citation and copyright	2023-03-04 20:08:11 +08:00
ver217	090f14fd6b	[misc] add reference (#2930 ) * [misc] add reference * [misc] add license	2023-02-28 18:07:24 +08:00
Frank Lee	918bc94b6b	[triton] added copyright information for flash attention (#2835 ) * [triton] added copyright information for flash attention * polish code	2023-02-21 11:25:57 +08:00
Frank Lee	dd14783f75	[kernel] fixed repeated loading of kernels (#2549 ) * [kernel] fixed repeated loading of kernels * polish code * polish code	2023-02-03 09:47:13 +08:00
Frank Lee	8b7495dd54	[example] integrate seq-parallel tutorial with CI (#2463 )	2023-01-13 14:40:05 +08:00
jiaruifang	69d9180c4b	[hotfix] issue #2388	2023-01-07 18:23:02 +08:00
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2023-01-06 20:50:26 +08:00
xcnick	85178a397a	[hotfix] fix error for torch 2.0 (#2243 )	2022-12-30 23:11:55 +08:00
Jiarui Fang	db4cbdc7fb	[builder] builder for scaled_upper_triang_masked_softmax (#2234 )	2022-12-30 09:58:00 +08:00
Jiarui Fang	1cb532ffec	[builder] multihead attn runtime building (#2203 ) * [hotfix] correcnt cpu_optim runtime compilation * [builder] multihead attn * fix bug * fix a bug	2022-12-27 16:06:09 +08:00
アマデウス	077a66dd81	updated attention kernel (#2133 )	2022-12-16 10:54:03 +08:00
HELSON	e7d3afc9cc	[optimizer] add div_scale for optimizers (#2117 ) * [optimizer] add div_scale for optimizers * [zero] use div_scale in zero optimizer * fix testing error	2022-12-12 17:58:57 +08:00
ver217	f8a7148dec	[kernel] move all symlinks of kernel to `colossalai._C` (#1971 )	2022-11-17 13:42:33 +08:00
zbian	6877121377	updated flash attention api	2022-11-15 15:25:39 +08:00
xcnick	e0da01ea71	[hotfix] fix build error when torch version >= 1.13 (#1803 )	2022-11-08 09:40:24 +08:00
oahzxl	9639ea88fc	[kernel] more flexible flashatt interface (#1804 )	2022-11-07 17:02:09 +08:00
oahzxl	501a9e9cd2	[hotfix] polish flash attention (#1802 )	2022-11-07 14:30:22 +08:00
Jiarui Fang	c248800359	[kernel] skip tests of flash_attn and triton when they are not available (#1798 )	2022-11-07 13:41:13 +08:00
oahzxl	25952b67d7	[feat] add flash attention (#1762 )	2022-10-26 16:15:52 +08:00
ver217	12b4887097	[hotfix] fix CPUAdam kernel nullptr (#1410 )	2022-08-05 19:45:45 +08:00
binmakeswell	7696cead8d	Recover kernal files	2022-07-13 12:08:21 +08:00
Maruyama_Aya	87f679aeae	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/kernels.h code style (#1291 )	2022-07-13 12:08:21 +08:00
doubleHU	d6f5ef8860	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/transform_kernels.cu code style (#1286 )	2022-07-13 12:08:21 +08:00
yuxuan-lou	5f6ab35d25	Hotfix/format (#1274 ) * [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.cpp code style Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	2022-07-13 12:08:21 +08:00
binmakeswell	c95e18cdb9	[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.h code style (#1270 )	2022-07-13 12:08:21 +08:00
DouJS	db13f96333	[NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_apply.cuh code style (#1264 )	2022-07-13 12:08:21 +08:00
shenggan	5d7366b144	[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.h code style (#1263 )	2022-07-13 12:08:21 +08:00
ziyu huang	f1cafcc73a	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#1261 ) Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>	2022-07-13 12:08:21 +08:00
Sze-qq	f8b9aaef47	[NFC] polish colossalai/kernel/cuda_native/csrc/type_shim.h code style (#1260 )	2022-07-13 12:08:21 +08:00
ver217	e4f555f29a	[optim] refactor fused sgd (#1134 )	2022-06-20 11:19:38 +08:00
zhengzangw	ae7c338105	[NFC] polish colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp code style	2022-05-20 23:57:38 +08:00
Frank Lee	533d0c46d8	[kernel] fixed the include bug in dropout kernel (#999 )	2022-05-18 21:43:18 +08:00
puck_WCR	bda70b4b66	[NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980 )	2022-05-17 10:25:06 +08:00

1 2 3

114 Commits (451e9142b8b8b77ed3138fb03ad54494c3c57126)