295 Commits (cloud/coati)

Author SHA1 Message Date
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2 years ago
Jiarui Fang a19eb80998
[embedding] updates some default parameters 2 years ago
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2 years ago
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) 2 years ago
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571) 2 years ago
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570) 2 years ago
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) 2 years ago
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) 2 years ago
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) 2 years ago
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) 2 years ago
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) 2 years ago
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) 2 years ago
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style 2 years ago
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564) 2 years ago
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569) 2 years ago
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548) 2 years ago
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545) 2 years ago
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537) 2 years ago
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538) 2 years ago
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535) 2 years ago
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526) 2 years ago
Jiarui Fang 4537d39df9
[doc] docstring for FreqAwareEmbeddingBag (#1525) 2 years ago
Jiarui Fang 9a9ef65313
[FAW] cpu caching operations (#1520) 2 years ago
Jiarui Fang af5438caa2
[FAW] refactor reorder() for CachedParamMgr (#1514) 2 years ago
Jiarui Fang 9feee6d06b
[FAW] LFU initialize with dataset freq (#1513) 2 years ago
CsRic 1b8fee8e9c
[FAW] shrink freq_cnter size (#1509) 2 years ago
Jiarui Fang ba61109b6c
[FAW] remove code related to chunk (#1501) 2 years ago
Jiarui Fang d5085bb317
[FAW] add more docs and fix a warning (#1500) 2 years ago
CsRic 0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494) 2 years ago
CsRic b8d0e39eaf
[FAW] LFU cache for the FAW 2 years ago
Jiarui Fang cde7b8a5b8
[FAW] init an LFU implementation for FAW (#1488) 2 years ago
Geng Zhang 0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) 2 years ago
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457) 2 years ago
ver217 367c615818
fix nvme docstring (#1450) 2 years ago
Geng Zhang 9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448) 2 years ago
Frank Lee ae1b58cd16
[tensor] added linear implementation for the new sharding spec (#1416) 2 years ago
Jiarui Fang 30b4dd17c0
[FAW] export FAW in _ops (#1438) 2 years ago
Jiarui Fang c9427a323f
hotfix #1434 (#1437) 2 years ago
Jiarui Fang 10b3df65c8
[FAW] move coloparam setting in test code. (#1429) 2 years ago
Jiarui Fang cb98cf5558
[FAW] parallel FreqAwareEmbedding (#1424) 2 years ago
Jiarui Fang d209aff684
Add FreqAwareEmbeddingBag (#1421) 2 years ago
Jiarui Fang 504419d261
[FAW] add cache manager for the cached embedding (#1419) 2 years ago
ver217 12b4887097
[hotfix] fix CPUAdam kernel nullptr (#1410) 2 years ago
ver217 04c9a86af8
[zero] ZeroDDP supports controlling outputs' dtype (#1399) 2 years ago
HELSON 4e98e938ce
[zero] alleviate memory usage in ZeRODDP state_dict (#1398) 2 years ago
HELSON c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) 2 years ago
ver217 83328329dd
[hotfix] fix zero ddp buffer cast (#1376) 2 years ago
ver217 5d5031e946
fix zero ddp state dict (#1378) 2 years ago
ver217 c415240db6
[nvme] CPUAdam and HybridAdam support NVMe offload (#1360) 2 years ago
HELSON 87775a0682
[colotensor] use cpu memory to store state_dict (#1367) 2 years ago