Commit Graph

188 Commits (8bcad7367769633699c4ec5b6d94f2119ff44a68)

Author SHA1 Message Date
digger yu a9d1cadc49
fix typo with colossalai/trainer utils zero (#3908)
1 year ago
Hongxin Liu dbb32692d2
[lazy] refactor lazy init (#3891)
1 year ago
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779)
2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
2 years ago
Hongxin Liu 4341f5e8e6
[lazyinit] fix clone and deepcopy (#3553)
2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379)
2 years ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago
ver217 f8289d4221
[lazyinit] combine lazy tensor with dtensor (#3204)
2 years ago
ver217 6ae8ed0407
[lazyinit] add correctness verification (#3147)
2 years ago
ver217 ed8f60b93b
[lazyinit] refactor lazy tensor and lazy init ctx (#3131)
2 years ago
ver217 823f3b9cf4
[doc] add deepspeed citation and copyright (#2996)
2 years ago
YH a848091141
Fix port exception type (#2925)
2 years ago
Nikita Shulga 01066152f1
Don't use `torch._six` (#2775)
2 years ago
ver217 f0aa191f51
[gemini] fix colo_init_context (#2683)
2 years ago
HELSON 552183bb74
[polish] polish ColoTensor and its submodules (#2537)
2 years ago
Super Daniel 35c0c0006e
[utils] lazy init. (#2148)
2 years ago
HELSON 7829aa094e
[ddp] add is_ddp_ignored (#2434)
2 years ago
Frank Lee 40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
2 years ago
Jiarui Fang 355ffb386e
[builder] unified cpu_optim fused_optim inferface (#2190)
2 years ago
Jiarui Fang 9587b080ba
[builder] use runtime builder for fused_optim (#2189)
2 years ago
BlueRum b3f73ce1c8
[Gemini] Update coloinit_ctx to support meta_tensor (#2147)
2 years ago
Jiarui Fang 8e14344ec9
[hotfix] fix a type in ColoInitContext (#2106)
2 years ago
Jiarui Fang 05545bfee9
[ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105)
2 years ago
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
2 years ago
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043)
2 years ago
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971)
2 years ago
Jiarui Fang 7e24b9b9ee
[Gemini] clean no used MemTraceOp (#1970)
2 years ago
Jiarui Fang 52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953)
2 years ago
Jiarui Fang 9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937)
2 years ago
Frank Lee e6ec99d389
[utils] fixed lazy init context (#1867)
2 years ago
Jiarui Fang 3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext (#1844)
2 years ago
ver217 99870726b1
[CheckpointIO] a uniform checkpoint I/O module (#1689)
2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
2 years ago
Kirigaya Kazuto 3b2a59b0ba
[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681)
2 years ago
CsRic 2ac46f7be4 [NFC] polish utils/tensor_detector/__init__.py code style (#1573)
2 years ago
LuGY c7d4932956 [NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566)
2 years ago
Kirigaya Kazuto 318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559)
2 years ago
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
2 years ago
ver217 2bed096848
[utils] optimize partition_tensor_parallel_state_dict (#1546)
2 years ago
ver217 a203b709d5
[hotfix] fix init context (#1543)
2 years ago
Boyuan Yao 47fd8e4a02
[utils] Add use_reetrant=False in utils.activation_checkpoint (#1460)
2 years ago
Frank Lee 5a52e21fe3
[test] fixed the activation codegen test (#1447)
2 years ago
ver217 821c6172e2
[utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer (#1442)
2 years ago
HELSON 527758b2ae
[hotfix] fix a running error in test_colo_checkpoint.py (#1387)
2 years ago
HELSON b6fd165f66
[checkpoint] add kwargs for load_state_dict (#1374)
2 years ago
Frank Lee 0c1a16ea5b
[util] standard checkpoint function naming (#1377)
2 years ago
Super Daniel be229217ce
[fx] add torchaudio test (#1369)
2 years ago
HELSON 8463290642
[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368)
2 years ago
HELSON 87775a0682
[colotensor] use cpu memory to store state_dict (#1367)
2 years ago