ColossalAI

Commit Graph

Author	SHA1	Message	Date
ver217	c415240db6	[nvme] CPUAdam and HybridAdam support NVMe offload (#1360 ) * impl nvme optimizer * update cpu adam * add unit test * update hybrid adam * update docstr * add TODOs * update CI * fix CI * fix CI * fix CI path * fix CI path * fix CI path * fix install tensornvme * fix CI * fix CI path * fix CI env variables * test CI * test CI * fix CI * fix nvme optim __del__ * fix adam __del__ * fix nvme optim * fix CI env variables * fix nvme optim import * test CI * test CI * fix CI	2022-07-26 17:25:24 +08:00
HELSON	8463290642	[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368 )	2022-07-26 14:41:53 +08:00
YuliangLiu0306	5542816690	[fx]add gpt2 passes for pipeline performance test (#1366 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [fx]add gpt2 passes for pipeline performance test	2022-07-26 14:31:00 +08:00
HELSON	87775a0682	[colotensor] use cpu memory to store state_dict (#1367 )	2022-07-26 14:13:38 +08:00
HELSON	943a96323e	[hotfix] fix no optimizer in save/load (#1363 )	2022-07-26 10:53:53 +08:00
Frank Lee	cd063ac37f	[fx] added activation checkpoint codegen support for torch < 1.12 (#1359 )	2022-07-25 23:35:31 +08:00
Frank Lee	644582eee9	[fx] added activation checkpoint codegen (#1355 )	2022-07-25 09:39:10 +08:00
ver217	6b43c789fd	fix zero optim backward_by_grad and save/load (#1353 )	2022-07-21 16:43:58 +08:00
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2022-07-21 15:54:53 +08:00
Frank Lee	274c1a3b5f	[fx] fixed apex normalization patch exception (#1352 )	2022-07-21 15:29:11 +08:00
ver217	ce470ba37e	[checkpoint] sharded optim save/load grad scaler (#1350 )	2022-07-21 15:21:21 +08:00
Frank Lee	05fae1fd56	[fx] added activation checkpointing annotation (#1349 ) * [fx] added activation checkpointing annotation * polish code * polish code	2022-07-21 11:14:28 +08:00
YuliangLiu0306	051592c64e	[fx] update MetaInforProp pass to process more complex node.meta (#1344 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [fx] update MetaInforProp pass to process more complex node.meta	2022-07-21 10:57:52 +08:00
HELSON	7a8702c06d	[colotensor] add Tensor.view op and its unit test (#1343 ) [colotensor] add megatron initialization for gpt2	2022-07-21 10:53:15 +08:00
YuliangLiu0306	942c8cd1fb	[fx] refactor tracer to trace complete graph (#1342 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [fx] refactor tracer to trace complete graph * add comments and solve conflicts.	2022-07-20 11:20:38 +08:00
Frank Lee	2cc1175c76	[fx] tested the complete workflow for auto-parallel (#1336 ) * [fx] tested the complete workflow for auto-parallel * polish code * polish code * polish code	2022-07-20 10:45:17 +08:00
YuliangLiu0306	4631fef8a0	[fx]refactor tracer (#1335 )	2022-07-19 15:50:42 +08:00
HELSON	f92c100ddd	[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339 )	2022-07-19 14:15:28 +08:00
ver217	0c51ff2c13	[hotfix] ZeroDDP use new process group (#1333 ) * process group supports getting ranks in group * chunk mgr receives a process group * update unit test * fix unit tests	2022-07-18 14:14:52 +08:00
Frank Lee	75abc75c15	[fx] fixed compatiblity issue with torch 1.10 (#1331 )	2022-07-18 11:41:27 +08:00
ver217	7a05367101	[hotfix] shared model returns cpu state_dict (#1328 )	2022-07-15 22:11:37 +08:00
Frank Lee	b2475d8c5c	[fx] fixed unit tests for torch 1.12 (#1327 )	2022-07-15 18:22:15 +08:00
HELSON	d49708ae43	[hotfix] fix ddp for unit test test_gpt2 (#1326 )	2022-07-15 18:19:52 +08:00
Frank Lee	250be4d31e	[utils] integrated colotensor with lazy init context (#1324 ) * [utils] integrated colotensor with lazy init context * polish code * polish code * polish code	2022-07-15 17:47:12 +08:00
YuliangLiu0306	e8acf55e8b	[fx] add balanced policy v2 (#1251 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [fx] add balanced policy v2 * add unittest	2022-07-15 14:54:26 +08:00
XYE	ca2d3f284f	[fx] Add unit test and fix bugs for transform_mlp_pass (#1299 ) * add test and fix bugs * add functions back * add comments	2022-07-15 14:37:58 +08:00
HELSON	1b41686461	[hotfix] fix unit test test_module_spec (#1321 )	2022-07-15 14:02:32 +08:00
Jiarui Fang	9e4c6449b0	[checkpoint] add ColoOptimizer checkpointing (#1316 )	2022-07-15 09:52:55 +08:00
ver217	7c70bfbefa	[hotfix] fix PipelineSharedModuleGradientHandler (#1314 )	2022-07-14 17:31:13 +08:00
Jiarui Fang	85f933b58b	[Optimizer] Remove useless ColoOptimizer (#1312 )	2022-07-14 16:57:48 +08:00
Jiarui Fang	9f10524313	[Optimizer] polish the init method of ColoOptimizer (#1310 )	2022-07-14 16:37:33 +08:00
Jiarui Fang	3ef3791a3b	[checkpoint] add test for bert and hotfix save bugs (#1297 )	2022-07-14 15:38:18 +08:00
Frank Lee	4f4d8c3656	[fx] added apex normalization to patched modules (#1300 ) * [fx] added apex normalization to patched modules * remove unused imports	2022-07-14 14:24:13 +08:00
Jiarui Fang	4165eabb1e	[hotfix] remove potiential circle import (#1307 ) * make it faster * [hotfix] remove circle import	2022-07-14 13:44:26 +08:00
HELSON	260a55804a	[hotfix] fix shape error in backward when using ColoTensor (#1298 )	2022-07-13 23:06:12 +08:00
runluo	f83c4d6597	[NFC] polish colossalai/nn/layer/wrapper/pipeline_wrapper.py code style (#1303 )	2022-07-13 19:01:07 +08:00
binmakeswell	7696cead8d	Recover kernal files	2022-07-13 12:08:21 +08:00
XYE	e83b2ce853	[NFC] polish colossalai/nn/layer/vanilla/layers.py code style (#1295 )	2022-07-13 12:08:21 +08:00
Liping233	1000a41fd5	[NFC] polish colossalai/nn/layer/vanilla/__init__.py code style (#1293 )	2022-07-13 12:08:21 +08:00
Maruyama_Aya	87f679aeae	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/kernels.h code style (#1291 )	2022-07-13 12:08:21 +08:00
Wangbo Zhao(黑色枷锁)	552667825b	[NFC] polish colossalai/nn/layer/parallel_1d/layers.py code style (#1290 )	2022-07-13 12:08:21 +08:00
doubleHU	d6f5ef8860	[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/transform_kernels.cu code style (#1286 )	2022-07-13 12:08:21 +08:00
Ziheng Qin	6d6c01e94d	[NFC] polish colossalai/__init__.py code style (#1285 )	2022-07-13 12:08:21 +08:00
Jiatong Han	38e3ccd1e9	[NFC] polish colossalai/nn/layer/parallel_sequence/layers.py code style (#1280 ) Co-authored-by: JThh <jiatong.han@u.nus.edu>	2022-07-13 12:08:21 +08:00
Boyuan Yao	b414eaa5db	[NFC] polish colossalai/nn/optimizer/lamb.py code style (#1275 )	2022-07-13 12:08:21 +08:00
yuxuan-lou	5f6ab35d25	Hotfix/format (#1274 ) * [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.cpp code style Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>	2022-07-13 12:08:21 +08:00
Super Daniel	52d145a342	[NFC] polish colossalai/nn/lr_scheduler/onecycle.py code style (#1269 )	2022-07-13 12:08:21 +08:00
Geng Zhang	0e06f62160	[NFC] polish colossalai/nn/layer/parallel_sequence/_operation.py code style (#1266 )	2022-07-13 12:08:21 +08:00
binmakeswell	c95e18cdb9	[NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.h code style (#1270 )	2022-07-13 12:08:21 +08:00
xyupeng	94bfd35184	[NFC] polish colossalai/builder/builder.py code style (#1265 )	2022-07-13 12:08:21 +08:00

1 2 3 4 5 ...

635 Commits (fb6f085907371217b2a6ac7bb300ae77d376c824)