Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels ( #2374 )
...
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
Jiarui Fang
355ffb386e
[builder] unified cpu_optim fused_optim inferface ( #2190 )
2 years ago
Jiarui Fang
9587b080ba
[builder] use runtime builder for fused_optim ( #2189 )
2 years ago
BlueRum
b3f73ce1c8
[Gemini] Update coloinit_ctx to support meta_tensor ( #2147 )
2 years ago
Jiarui Fang
8e14344ec9
[hotfix] fix a type in ColoInitContext ( #2106 )
2 years ago
Jiarui Fang
05545bfee9
[ColoTensor] throw error when ColoInitContext meets meta parameter. ( #2105 )
2 years ago
HELSON
f6178728a0
[gemini] fix init bugs for modules ( #2047 )
...
* [gemini] fix init bugs for modules
* fix bugs
2 years ago
Jiarui Fang
31c644027b
[hotfix] hotfix Gemini for no leaf modules bug ( #2043 )
2 years ago
ver217
f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` ( #1971 )
2 years ago
Jiarui Fang
7e24b9b9ee
[Gemini] clean no used MemTraceOp ( #1970 )
2 years ago
Jiarui Fang
52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. ( #1953 )
2 years ago
Jiarui Fang
9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. ( #1937 )
2 years ago
Frank Lee
e6ec99d389
[utils] fixed lazy init context ( #1867 )
2 years ago
Jiarui Fang
3ce4463fe6
[utils] remove lazy_memory_allocate from ColoInitContext ( #1844 )
2 years ago
ver217
99870726b1
[CheckpointIO] a uniform checkpoint I/O module ( #1689 )
2 years ago
HELSON
1468e4bcfc
[zero] add constant placement policy ( #1705 )
...
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2 years ago
Kirigaya Kazuto
3b2a59b0ba
[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug ( #1681 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
* [pipeline/pytree] add pytree to process args and kwargs | provide to process args and kwargs after forward
2 years ago
CsRic
2ac46f7be4
[NFC] polish utils/tensor_detector/__init__.py code style ( #1573 )
...
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2 years ago
LuGY
c7d4932956
[NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style ( #1566 )
2 years ago
Kirigaya Kazuto
318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style ( #1559 )
2 years ago
ver217
ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint ( #1548 )
...
* refactor parallel layer
* broadcast rank0 model after load ckpt
2 years ago
ver217
2bed096848
[utils] optimize partition_tensor_parallel_state_dict ( #1546 )
2 years ago
ver217
a203b709d5
[hotfix] fix init context ( #1543 )
...
* fix init context
* fix lazy init ctx
2 years ago
Boyuan Yao
47fd8e4a02
[utils] Add use_reetrant=False in utils.activation_checkpoint ( #1460 )
...
* [utils] Add use_reetrant=False into colossalai checkpoint
* [utils] add some annotation in utils.activaion_checkpoint
* [test] add reset_seed at the beginning of tests in test_actiavion_checkpointing.py
* [test] modify test_activation_checkpoint.py
* [test] modify test for reentrant=False
2 years ago
Frank Lee
5a52e21fe3
[test] fixed the activation codegen test ( #1447 )
...
* [test] fixed the activation codegen test
* polish code
2 years ago
ver217
821c6172e2
[utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer ( #1442 )
2 years ago
HELSON
527758b2ae
[hotfix] fix a running error in test_colo_checkpoint.py ( #1387 )
2 years ago
HELSON
b6fd165f66
[checkpoint] add kwargs for load_state_dict ( #1374 )
2 years ago
Frank Lee
0c1a16ea5b
[util] standard checkpoint function naming ( #1377 )
2 years ago
Super Daniel
be229217ce
[fx] add torchaudio test ( #1369 )
...
* [fx]add torchaudio test
* [fx]add torchaudio test
* [fx] add torchaudio test
* [fx] add torchaudio test
* [fx] add torchaudio test
* [fx] add torchaudio test
* [fx] add torchaudio test
* [fx] add torchaudio test and test patches
* Delete ~
* [fx] add patches and patches test
* [fx] add patches and patches test
* [fx] fix patches
* [fx] fix rnn patches
* [fx] fix rnn patches
* [fx] fix rnn patches
* [fx] fix rnn patches
* [fx] merge upstream
* [fx] fix import errors
2 years ago
HELSON
8463290642
[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint ( #1368 )
2 years ago
HELSON
87775a0682
[colotensor] use cpu memory to store state_dict ( #1367 )
2 years ago
HELSON
943a96323e
[hotfix] fix no optimizer in save/load ( #1363 )
2 years ago
HELSON
7a8702c06d
[colotensor] add Tensor.view op and its unit test ( #1343 )
...
[colotensor] add megatron initialization for gpt2
2 years ago
Frank Lee
2cc1175c76
[fx] tested the complete workflow for auto-parallel ( #1336 )
...
* [fx] tested the complete workflow for auto-parallel
* polish code
* polish code
* polish code
2 years ago
HELSON
f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test ( #1339 )
2 years ago
Frank Lee
250be4d31e
[utils] integrated colotensor with lazy init context ( #1324 )
...
* [utils] integrated colotensor with lazy init context
* polish code
* polish code
* polish code
2 years ago
Jiarui Fang
9e4c6449b0
[checkpoint] add ColoOptimizer checkpointing ( #1316 )
2 years ago
Jiarui Fang
3ef3791a3b
[checkpoint] add test for bert and hotfix save bugs ( #1297 )
2 years ago
Jiarui Fang
4165eabb1e
[hotfix] remove potiential circle import ( #1307 )
...
* make it faster
* [hotfix] remove circle import
2 years ago
Jiarui Fang
c92f84fcdb
[tensor] distributed checkpointing for parameters ( #1240 )
2 years ago
Jiarui Fang
9bcd2fd4af
[tensor] a shorter shard and replicate spec ( #1245 )
2 years ago
Jiarui Fang
20da6e48c8
[checkpoint] save sharded optimizer states ( #1237 )
2 years ago
Jiarui Fang
3b500984b1
[tensor] fix some unittests ( #1234 )
2 years ago
ver217
a45ddf2d5f
[hotfix] fix sharded optim step and clip_grad_norm ( #1226 )
2 years ago
Yi Zhao
04537bf83e
[checkpoint]support generalized scheduler ( #1222 )
2 years ago
Jiarui Fang
52736205d9
[checkpoint] make unitest faster ( #1217 )
2 years ago
Jiarui Fang
f38006ea83
[checkpoint] checkpoint for ColoTensor Model ( #1196 )
2 years ago
Jiarui Fang
ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. ( #1203 )
2 years ago
YuliangLiu0306
63d2a93878
[context]support arbitary module materialization. ( #1193 )
...
* [CLI] add CLI launcher
* Revert "[CLI] add CLI launcher"
This reverts commit df7e6506d4
.
* [context]support arbitary module materialization.
* [test]add numerical check for lazy init context.
2 years ago