Commit Graph

253 Commits (85e045b063a70cd36ccc0405acc245d86f2a1621)

Author SHA1 Message Date
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647)
2 years ago
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
Jiarui Fang e57df80325
[embeddings] cache option (#1635)
2 years ago
HELSON a088022efc
[moe] fix moe bugs (#1633)
2 years ago
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
2 years ago
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625)
2 years ago
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611)
2 years ago
Jiarui Fang a19eb80998
[embedding] updates some default parameters
2 years ago
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584)
2 years ago
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572)
2 years ago
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
2 years ago
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
2 years ago
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568)
2 years ago
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561)
2 years ago
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556)
2 years ago
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555)
2 years ago
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553)
2 years ago
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552)
2 years ago
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style
2 years ago
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564)
2 years ago
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569)
2 years ago
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
2 years ago
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545)
2 years ago
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537)
2 years ago
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538)
2 years ago
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535)
2 years ago
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526)
2 years ago
Jiarui Fang 4537d39df9
[doc] docstring for FreqAwareEmbeddingBag (#1525)
2 years ago
Jiarui Fang 9a9ef65313
[FAW] cpu caching operations (#1520)
2 years ago
Jiarui Fang af5438caa2
[FAW] refactor reorder() for CachedParamMgr (#1514)
2 years ago
Jiarui Fang 9feee6d06b
[FAW] LFU initialize with dataset freq (#1513)
2 years ago
CsRic 1b8fee8e9c
[FAW] shrink freq_cnter size (#1509)
2 years ago
Jiarui Fang ba61109b6c
[FAW] remove code related to chunk (#1501)
2 years ago
Jiarui Fang d5085bb317
[FAW] add more docs and fix a warning (#1500)
2 years ago
CsRic 0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494)
2 years ago
CsRic b8d0e39eaf
[FAW] LFU cache for the FAW
2 years ago
Jiarui Fang cde7b8a5b8
[FAW] init an LFU implementation for FAW (#1488)
2 years ago
Geng Zhang 0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462)
2 years ago
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457)
2 years ago
ver217 367c615818
fix nvme docstring (#1450)
2 years ago
Geng Zhang 9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448)
2 years ago
Frank Lee ae1b58cd16
[tensor] added linear implementation for the new sharding spec (#1416)
2 years ago
Jiarui Fang 30b4dd17c0
[FAW] export FAW in _ops (#1438)
2 years ago
Jiarui Fang c9427a323f
hotfix #1434 (#1437)
2 years ago
Jiarui Fang 10b3df65c8
[FAW] move coloparam setting in test code. (#1429)
2 years ago
Jiarui Fang cb98cf5558
[FAW] parallel FreqAwareEmbedding (#1424)
2 years ago
Jiarui Fang d209aff684
Add FreqAwareEmbeddingBag (#1421)
2 years ago
Jiarui Fang 504419d261
[FAW] add cache manager for the cached embedding (#1419)
2 years ago
ver217 12b4887097
[hotfix] fix CPUAdam kernel nullptr (#1410)
2 years ago
ver217 04c9a86af8
[zero] ZeroDDP supports controlling outputs' dtype (#1399)
2 years ago
HELSON 4e98e938ce
[zero] alleviate memory usage in ZeRODDP state_dict (#1398)
2 years ago
HELSON c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388)
2 years ago
ver217 83328329dd
[hotfix] fix zero ddp buffer cast (#1376)
2 years ago
ver217 5d5031e946
fix zero ddp state dict (#1378)
2 years ago
ver217 c415240db6
[nvme] CPUAdam and HybridAdam support NVMe offload (#1360)
2 years ago
HELSON 87775a0682
[colotensor] use cpu memory to store state_dict (#1367)
2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351)
2 years ago
HELSON 7a8702c06d
[colotensor] add Tensor.view op and its unit test (#1343)
2 years ago
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
2 years ago
HELSON 1b41686461
[hotfix] fix unit test test_module_spec (#1321)
2 years ago
Jiarui Fang 9e4c6449b0
[checkpoint] add ColoOptimizer checkpointing (#1316)
2 years ago
Jiarui Fang 85f933b58b
[Optimizer] Remove useless ColoOptimizer (#1312)
2 years ago
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310)
2 years ago
HELSON 260a55804a
[hotfix] fix shape error in backward when using ColoTensor (#1298)
2 years ago
runluo f83c4d6597
[NFC] polish colossalai/nn/layer/wrapper/pipeline_wrapper.py code style (#1303)
2 years ago
XYE e83b2ce853 [NFC] polish colossalai/nn/layer/vanilla/layers.py code style (#1295)
2 years ago
Liping233 1000a41fd5 [NFC] polish colossalai/nn/layer/vanilla/__init__.py code style (#1293)
2 years ago
Wangbo Zhao(黑色枷锁) 552667825b [NFC] polish colossalai/nn/layer/parallel_1d/layers.py code style (#1290)
2 years ago
Jiatong Han 38e3ccd1e9 [NFC] polish colossalai/nn/layer/parallel_sequence/layers.py code style (#1280)
2 years ago
Boyuan Yao b414eaa5db [NFC] polish colossalai/nn/optimizer/lamb.py code style (#1275)
2 years ago
Super Daniel 52d145a342 [NFC] polish colossalai/nn/lr_scheduler/onecycle.py code style (#1269)
2 years ago
Geng Zhang 0e06f62160 [NFC] polish colossalai/nn/layer/parallel_sequence/_operation.py code style (#1266)
2 years ago
superhao1995 f660152c73 [NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style (#1258)
2 years ago
Thunderbeee 9738fb0f78 [NFC] polish colossalai/nn/lr_scheduler/__init__.py (#1255)
2 years ago
Ofey Chan 2dd4d556fb
[NFC] polish colossalai/nn/init.py code style (#1292)
2 years ago
HELSON abba4d84e1
[hotfix] fix bert model test in unitests (#1272)
2 years ago
oahzxl 0cf8e8e91c
[NFC] polish <colossalai/nn/lr_scheduler/poly.py> code style (#1267)
2 years ago
Jiarui Fang 1aad903c15
[tensor] redistribute among different process groups (#1247)
2 years ago
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245)
2 years ago
Jiarui Fang 2699dfbbfd
[rename] convert_to_dist -> redistribute (#1243)
2 years ago
Jiarui Fang 4a76084dc9
[tensor] add zero_like colo op, important for Optimizer (#1236)
2 years ago
Jiarui Fang 3b500984b1
[tensor] fix some unittests (#1234)
2 years ago
HELSON 0453776def
[tensor] fix a assertion in colo_tensor cross_entropy (#1232)
2 years ago
HELSON 42ab36b762
[tensor] add unitest for colo_tensor 1DTP cross_entropy (#1230)
2 years ago
Yi Zhao 04537bf83e
[checkpoint]support generalized scheduler (#1222)
2 years ago
Jiarui Fang a98319f023
[tensor] torch function return colotensor (#1229)
2 years ago
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203)
2 years ago
Jiarui Fang b5f25eb32a
[Tensor] add cpu group to ddp (#1200)
2 years ago
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189)
2 years ago
Jiarui Fang 372f791444
[refactor] move chunk and chunkmgr to directory gemini (#1182)
2 years ago
ver217 6b2f2ab9bb
[ddp] ColoDDP uses bucket all-reduce (#1177)
2 years ago
Jiarui Fang 1b657f9ce1
[tensor] revert local view back (#1178)
2 years ago
Jiarui Fang 0dd4e2bbfb
[Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176)
2 years ago
Ziyue Jiang dd0420909f
[Tensor] rename parallel_action (#1174)
2 years ago
Jiarui Fang aa7bef73d4
[Tensor] distributed view supports inter-process hybrid parallel (#1169)
2 years ago
Jiarui Fang 4b9bba8116
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168)
2 years ago
Jiarui Fang f4ef224358
[Tensor] remove ParallelAction, use ComputeSpec instread (#1166)
2 years ago
Jiarui Fang 177c374401
remove gather out in parallel action (#1163)
2 years ago