HELSON
dddacd2d2c
[hotfix] add norm clearing for the overflow step ( #2416 )
2 years ago
HELSON
ea13a201bb
[polish] polish code for get_static_torch_model ( #2405 )
...
* [gemini] polish code
* [testing] remove code
* [gemini] make more robust
2 years ago
Frank Lee
551cafec14
[doc] updated kernel-related optimisers' docstring ( #2385 )
...
* [doc] updated kernel-related optimisers' docstring
* polish doc
2 years ago
eric8607242
9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP ( #2363 )
...
* Fix state_dict output for ZeroDDP duplicated parameters
* Rewrite state_dict based on get_static_torch_model
* Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)
2 years ago
Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels ( #2374 )
...
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
HELSON
48d33b1b17
[gemini] add get static torch model ( #2356 )
2 years ago
Jiarui Fang
16cc8e6aa7
[builder] MOE builder ( #2277 )
2 years ago
zbian
e94c79f15b
improved allgather & reducescatter for 3d
2 years ago
Jiarui Fang
af32022f74
[Gemini] fix the convert_to_torch_module bug ( #2269 )
2 years ago
HELSON
2458659919
[zero] fix error for BEiT models ( #2169 )
...
* [zero] fix error for BEiT models
* [ColoParameter] add unpack operation for tuple arguments
* fix bugs
* fix chunkv2 unit testing
* add assertion for gradient state
2 years ago
Jiarui Fang
355ffb386e
[builder] unified cpu_optim fused_optim inferface ( #2190 )
2 years ago
Jiarui Fang
9587b080ba
[builder] use runtime builder for fused_optim ( #2189 )
2 years ago
Jiarui Fang
d42afd30f8
[builder] runtime adam and fused_optim builder ( #2184 )
2 years ago
Tongping Liu
ab54fed292
[hotfix] add kwargs for colo_addmm ( #2171 )
2 years ago
アマデウス
622f863291
[hotfix] Jit type hint #2161 ( #2164 )
2 years ago
Jiarui Fang
2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. ( #2151 )
2 years ago
Jiarui Fang
bdef9dfdbe
[NFC] remove useless graph node code ( #2150 )
2 years ago
Jiarui Fang
9214d1fe28
[Gemini] chunk init using runtime visited param order ( #2115 )
2 years ago
HELSON
e7d3afc9cc
[optimizer] add div_scale for optimizers ( #2117 )
...
* [optimizer] add div_scale for optimizers
* [zero] use div_scale in zero optimizer
* fix testing error
2 years ago
Jiarui Fang
e5aa8333e4
[NFC] update chunk manager API ( #2119 )
2 years ago
Jiarui Fang
e99edfcb51
[NFC] polish comments for Chunk class ( #2116 )
2 years ago
HELSON
63fbba3c19
[zero] add L2 gradient clipping for ZeRO ( #2112 )
...
* [zero] add L2 gradient clipping
* [testing] add MlpModel
* [zero] add unit test for grad clipping
* fix atol
2 years ago
Jiarui Fang
1f99205827
[Gemini] remove static tracer ( #2083 )
2 years ago
Jiarui Fang
b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook ( #2080 )
2 years ago
HELSON
e37f3db40c
[gemini] add arguments ( #2046 )
...
* [zero] fix testing parameters
* [gemini] add arguments
* add docstrings
2 years ago
Jiarui Fang
96134e7be3
[hotfix] add bert test for gemini fwd bwd ( #2035 )
2 years ago
Jiarui Fang
8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor ( #2003 )
2 years ago
Jiarui Fang
a2d3266648
[hotfix] make Gemini work for conv DNN ( #1998 )
2 years ago
Jiarui Fang
cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook ( #1972 )
2 years ago
ver217
f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` ( #1971 )
2 years ago
Jiarui Fang
f7e276fa71
[Gemini] add GeminiAdamOptimizer ( #1960 )
2 years ago
アマデウス
e52f9d9109
[tensorparallel] fixed tp layers ( #1938 )
2 years ago
Jiarui Fang
986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 ( #1876 )
2 years ago
Jiarui Fang
c2947dadf1
[inference] streaming Linear 1D Row inference ( #1874 )
2 years ago
zbian
653b0a620e
added skip_bias_add for non-tp linear
2 years ago
アマデウス
4268ae017b
[kernel] added jit warmup ( #1792 )
2 years ago
Jiarui Fang
cd5a0d56fa
[Gemini] make gemini usage simple ( #1821 )
2 years ago
Zihao
20e255d4e8
MemStatsCollectorStatic ( #1765 )
2 years ago
HELSON
c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 ( #1786 )
...
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12
* [zero] add cpu shard init
* [zero] add tiny example test
* [colo_tensor] fix bugs for torch-1.11
2 years ago
kurisusnowdeng
0b8161fab8
updated tp layers
2 years ago
Sze-qq
23703c9dd6
[NFC] polish colossalai/nn/metric/_utils.py code style ( #1727 )
2 years ago
Ofey Chan
7e62af28a0
[NFC] polish accuracy_2d.py code style ( #1719 )
2 years ago
yuxuan-lou
2b49ca80a3
[NFC] polish colossalai/nn/lr_scheduler/linear.py code style ( #1716 )
2 years ago
shenggan
e1d780030d
[NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style ( #1714 )
2 years ago
HELSON
1468e4bcfc
[zero] add constant placement policy ( #1705 )
...
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2 years ago
binmakeswell
5f41463a76
add optimizer README for tutorials ( #1707 )
2 years ago
Jiarui Fang
21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding ( #1699 )
2 years ago
Jiarui Fang
363fc2861a
[embeddings] more detailed timer ( #1692 )
2 years ago
jim
e5ab6be72e
[hotfix[ fix colotensor.type() raise NotImplementedError ( #1682 )
2 years ago
HELSON
b28991dd0a
[feature] A new ZeRO implementation ( #1644 )
2 years ago
Jiarui Fang
c638bec028
[embedding] polish async copy ( #1657 )
2 years ago
Jiarui Fang
988570e4a6
[embedding] add more detail profiling ( #1656 )
2 years ago
Jiarui Fang
e1f97fd2b8
[embedding] print profiling results ( #1654 )
2 years ago
Jiarui Fang
04443605a5
[embedding] non-blocking cpu-gpu copy ( #1647 )
2 years ago
CsRic
0767f67a0f
[embedding] isolate cache_op from forward ( #1645 )
...
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2 years ago
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2 years ago
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2 years ago
Jiarui Fang
e57df80325
[embeddings] cache option ( #1635 )
2 years ago
HELSON
a088022efc
[moe] fix moe bugs ( #1633 )
2 years ago
HELSON
f7f2248771
[moe] fix MoE bugs ( #1628 )
...
* remove forced FP32 modules
* correct no_shard-contexts' positions
2 years ago
Jiarui Fang
38c68b5b9a
[embedding] rollback for better FAW performance ( #1625 )
2 years ago
Jiarui Fang
504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num ( #1611 )
2 years ago
Jiarui Fang
a19eb80998
[embedding] updates some default parameters
2 years ago
CsRic
f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode ( #1584 )
2 years ago
Sze-qq
2144cbae8c
[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style ( #1572 )
2 years ago
superhao1995
e4bf7ae667
[NFC] polish colossalai/nn/lr_scheduler/torch.py code style ( #1571 )
...
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2 years ago
Jiatong Han
3263cdf57f
[NFC] polish colossalai/nn/parallel/data_parallel.py code style ( #1570 )
...
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2 years ago
DouJS
f586887a90
[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style ( #1568 )
2 years ago
BigOneLiXiaoMing
0c4c9aa6e0
[NFC] polish colossalai/nn/_ops/embedding.py code style ( #1561 )
2 years ago
Ofey Chan
7cc052f6c0
[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py ( #1556 )
2 years ago
yuxuan-lou
413f9c19f4
[NFC] polish colossalai/nn/_ops/layernorm.py code style ( #1555 )
2 years ago
shenggan
8edb777cc2
[NFC] polish colossalai/nn/loss/loss_2p5d.py code style ( #1553 )
2 years ago
Maruyama_Aya
bd2d789832
[NFC] polish colossalai/nn/_ops/embedding_bag.py code style ( #1552 )
2 years ago
binmakeswell
73e9eb13b7
[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style
2 years ago
CsRic
a389ac4ec9
[embedding] cache_embedding small improvement ( #1564 )
2 years ago
ver217
10dd8226b1
add gather_output for VocabParallelClassifier1D ( #1569 )
2 years ago
ver217
ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint ( #1548 )
...
* refactor parallel layer
* broadcast rank0 model after load ckpt
2 years ago
Jiarui Fang
64169f3e8f
[embedding] polish parallel embedding tablewise ( #1545 )
2 years ago
CsRic
964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application ( #1537 )
2 years ago
Jiarui Fang
521078ffc9
[embedding] fix a bug in table wise sharding ( #1538 )
2 years ago
Jiarui Fang
87134524fd
[embedding] tablewise sharding polish ( #1535 )
2 years ago
CsRic
5156d5b4f8
[embedding] add tablewise sharding for FAW ( #1526 )
2 years ago
Jiarui Fang
4537d39df9
[doc] docstring for FreqAwareEmbeddingBag ( #1525 )
2 years ago
Jiarui Fang
9a9ef65313
[FAW] cpu caching operations ( #1520 )
2 years ago
Jiarui Fang
af5438caa2
[FAW] refactor reorder() for CachedParamMgr ( #1514 )
2 years ago
Jiarui Fang
9feee6d06b
[FAW] LFU initialize with dataset freq ( #1513 )
2 years ago
CsRic
1b8fee8e9c
[FAW] shrink freq_cnter size ( #1509 )
2 years ago
Jiarui Fang
ba61109b6c
[FAW] remove code related to chunk ( #1501 )
2 years ago
Jiarui Fang
d5085bb317
[FAW] add more docs and fix a warning ( #1500 )
2 years ago
CsRic
0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats ( #1494 )
2 years ago
CsRic
b8d0e39eaf
[FAW] LFU cache for the FAW
2 years ago
Jiarui Fang
cde7b8a5b8
[FAW] init an LFU implementation for FAW ( #1488 )
2 years ago
Geng Zhang
0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding ( #1462 )
2 years ago
Jiarui Fang
a1476ea882
[NFC] polish doc style for ColoTensor ( #1457 )
2 years ago
ver217
367c615818
fix nvme docstring ( #1450 )
2 years ago
Geng Zhang
9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding ( #1448 )
2 years ago
Frank Lee
ae1b58cd16
[tensor] added linear implementation for the new sharding spec ( #1416 )
...
* [tensor] added linear implementation for the new sharding spec
* polish code
2 years ago
Jiarui Fang
30b4dd17c0
[FAW] export FAW in _ops ( #1438 )
2 years ago
Jiarui Fang
c9427a323f
hotfix #1434 ( #1437 )
2 years ago
Jiarui Fang
10b3df65c8
[FAW] move coloparam setting in test code. ( #1429 )
2 years ago