Commit Graph

306 Commits (10e3c9f923caf4fb68ab61e96c244bd5cca9b9da)

Author SHA1 Message Date
HELSON dddacd2d2c
[hotfix] add norm clearing for the overflow step (#2416)
2 years ago
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
2 years ago
Frank Lee 551cafec14
[doc] updated kernel-related optimisers' docstring (#2385)
2 years ago
eric8607242 9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP (#2363)
2 years ago
Frank Lee 40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
2 years ago
HELSON 48d33b1b17
[gemini] add get static torch model (#2356)
2 years ago
Jiarui Fang 16cc8e6aa7
[builder] MOE builder (#2277)
2 years ago
zbian e94c79f15b improved allgather & reducescatter for 3d
2 years ago
Jiarui Fang af32022f74
[Gemini] fix the convert_to_torch_module bug (#2269)
2 years ago
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
2 years ago
Jiarui Fang 355ffb386e
[builder] unified cpu_optim fused_optim inferface (#2190)
2 years ago
Jiarui Fang 9587b080ba
[builder] use runtime builder for fused_optim (#2189)
2 years ago
Jiarui Fang d42afd30f8
[builder] runtime adam and fused_optim builder (#2184)
2 years ago
Tongping Liu ab54fed292
[hotfix] add kwargs for colo_addmm (#2171)
2 years ago
アマデウス 622f863291
[hotfix] Jit type hint #2161 (#2164)
2 years ago
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151)
2 years ago
Jiarui Fang bdef9dfdbe
[NFC] remove useless graph node code (#2150)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
HELSON e7d3afc9cc
[optimizer] add div_scale for optimizers (#2117)
2 years ago
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
2 years ago
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
HELSON e37f3db40c
[gemini] add arguments (#2046)
2 years ago
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035)
2 years ago
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003)
2 years ago
Jiarui Fang a2d3266648
[hotfix] make Gemini work for conv DNN (#1998)
2 years ago
Jiarui Fang cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972)
2 years ago
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971)
2 years ago
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960)
2 years ago
アマデウス e52f9d9109
[tensorparallel] fixed tp layers (#1938)
2 years ago
Jiarui Fang 986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876)
2 years ago
Jiarui Fang c2947dadf1
[inference] streaming Linear 1D Row inference (#1874)
2 years ago
zbian 653b0a620e added skip_bias_add for non-tp linear
2 years ago
アマデウス 4268ae017b
[kernel] added jit warmup (#1792)
2 years ago
Jiarui Fang cd5a0d56fa
[Gemini] make gemini usage simple (#1821)
2 years ago
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765)
2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
2 years ago
kurisusnowdeng 0b8161fab8 updated tp layers
2 years ago
Sze-qq 23703c9dd6 [NFC] polish colossalai/nn/metric/_utils.py code style (#1727)
2 years ago
Ofey Chan 7e62af28a0 [NFC] polish accuracy_2d.py code style (#1719)
2 years ago
yuxuan-lou 2b49ca80a3 [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716)
2 years ago
shenggan e1d780030d [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714)
2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
2 years ago
binmakeswell 5f41463a76
add optimizer README for tutorials (#1707)
2 years ago
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699)
2 years ago
Jiarui Fang 363fc2861a
[embeddings] more detailed timer (#1692)
2 years ago
jim e5ab6be72e
[hotfix[ fix colotensor.type() raise NotImplementedError (#1682)
2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644)
2 years ago
Jiarui Fang c638bec028
[embedding] polish async copy (#1657)
2 years ago
Jiarui Fang 988570e4a6
[embedding] add more detail profiling (#1656)
2 years ago
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654)
2 years ago
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647)
2 years ago
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
Jiarui Fang e57df80325
[embeddings] cache option (#1635)
2 years ago
HELSON a088022efc
[moe] fix moe bugs (#1633)
2 years ago
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
2 years ago
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625)
2 years ago
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611)
2 years ago
Jiarui Fang a19eb80998
[embedding] updates some default parameters
2 years ago
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584)
2 years ago
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572)
2 years ago
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
2 years ago
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
2 years ago
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568)
2 years ago
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561)
2 years ago
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556)
2 years ago
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555)
2 years ago
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553)
2 years ago
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552)
2 years ago
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style
2 years ago
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564)
2 years ago
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569)
2 years ago
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
2 years ago
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545)
2 years ago
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537)
2 years ago
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538)
2 years ago
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535)
2 years ago
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526)
2 years ago
Jiarui Fang 4537d39df9
[doc] docstring for FreqAwareEmbeddingBag (#1525)
2 years ago
Jiarui Fang 9a9ef65313
[FAW] cpu caching operations (#1520)
2 years ago
Jiarui Fang af5438caa2
[FAW] refactor reorder() for CachedParamMgr (#1514)
2 years ago
Jiarui Fang 9feee6d06b
[FAW] LFU initialize with dataset freq (#1513)
2 years ago
CsRic 1b8fee8e9c
[FAW] shrink freq_cnter size (#1509)
2 years ago
Jiarui Fang ba61109b6c
[FAW] remove code related to chunk (#1501)
2 years ago
Jiarui Fang d5085bb317
[FAW] add more docs and fix a warning (#1500)
2 years ago
CsRic 0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494)
2 years ago
CsRic b8d0e39eaf
[FAW] LFU cache for the FAW
2 years ago
Jiarui Fang cde7b8a5b8
[FAW] init an LFU implementation for FAW (#1488)
2 years ago
Geng Zhang 0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462)
2 years ago
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457)
2 years ago
ver217 367c615818
fix nvme docstring (#1450)
2 years ago
Geng Zhang 9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448)
2 years ago
Frank Lee ae1b58cd16
[tensor] added linear implementation for the new sharding spec (#1416)
2 years ago
Jiarui Fang 30b4dd17c0
[FAW] export FAW in _ops (#1438)
2 years ago
Jiarui Fang c9427a323f
hotfix #1434 (#1437)
2 years ago
Jiarui Fang 10b3df65c8
[FAW] move coloparam setting in test code. (#1429)
2 years ago