Commit Graph

162 Commits (0dbd61c29b2d5373bcd9a3bc2be7ce14a1d07ef0)

Author SHA1 Message Date
ver217 367c615818
fix nvme docstring (#1450)
2 years ago
Geng Zhang 9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448)
2 years ago
Frank Lee ae1b58cd16
[tensor] added linear implementation for the new sharding spec (#1416)
2 years ago
Jiarui Fang 30b4dd17c0
[FAW] export FAW in _ops (#1438)
2 years ago
Jiarui Fang c9427a323f
hotfix #1434 (#1437)
2 years ago
Jiarui Fang 10b3df65c8
[FAW] move coloparam setting in test code. (#1429)
2 years ago
Jiarui Fang cb98cf5558
[FAW] parallel FreqAwareEmbedding (#1424)
2 years ago
Jiarui Fang d209aff684
Add FreqAwareEmbeddingBag (#1421)
2 years ago
Jiarui Fang 504419d261
[FAW] add cache manager for the cached embedding (#1419)
2 years ago
ver217 12b4887097
[hotfix] fix CPUAdam kernel nullptr (#1410)
2 years ago
ver217 04c9a86af8
[zero] ZeroDDP supports controlling outputs' dtype (#1399)
2 years ago
HELSON 4e98e938ce
[zero] alleviate memory usage in ZeRODDP state_dict (#1398)
2 years ago
HELSON c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388)
2 years ago
ver217 83328329dd
[hotfix] fix zero ddp buffer cast (#1376)
2 years ago
ver217 5d5031e946
fix zero ddp state dict (#1378)
2 years ago
ver217 c415240db6
[nvme] CPUAdam and HybridAdam support NVMe offload (#1360)
2 years ago
HELSON 87775a0682
[colotensor] use cpu memory to store state_dict (#1367)
2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351)
2 years ago
HELSON 7a8702c06d
[colotensor] add Tensor.view op and its unit test (#1343)
2 years ago
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
2 years ago
HELSON 1b41686461
[hotfix] fix unit test test_module_spec (#1321)
2 years ago
Jiarui Fang 9e4c6449b0
[checkpoint] add ColoOptimizer checkpointing (#1316)
2 years ago
Jiarui Fang 85f933b58b
[Optimizer] Remove useless ColoOptimizer (#1312)
2 years ago
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310)
2 years ago
HELSON 260a55804a
[hotfix] fix shape error in backward when using ColoTensor (#1298)
2 years ago
runluo f83c4d6597
[NFC] polish colossalai/nn/layer/wrapper/pipeline_wrapper.py code style (#1303)
2 years ago
XYE e83b2ce853 [NFC] polish colossalai/nn/layer/vanilla/layers.py code style (#1295)
2 years ago
Liping233 1000a41fd5 [NFC] polish colossalai/nn/layer/vanilla/__init__.py code style (#1293)
2 years ago
Wangbo Zhao(黑色枷锁) 552667825b [NFC] polish colossalai/nn/layer/parallel_1d/layers.py code style (#1290)
2 years ago
Jiatong Han 38e3ccd1e9 [NFC] polish colossalai/nn/layer/parallel_sequence/layers.py code style (#1280)
2 years ago
Boyuan Yao b414eaa5db [NFC] polish colossalai/nn/optimizer/lamb.py code style (#1275)
2 years ago
Super Daniel 52d145a342 [NFC] polish colossalai/nn/lr_scheduler/onecycle.py code style (#1269)
2 years ago
Geng Zhang 0e06f62160 [NFC] polish colossalai/nn/layer/parallel_sequence/_operation.py code style (#1266)
2 years ago
superhao1995 f660152c73 [NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style (#1258)
2 years ago
Thunderbeee 9738fb0f78 [NFC] polish colossalai/nn/lr_scheduler/__init__.py (#1255)
2 years ago
Ofey Chan 2dd4d556fb
[NFC] polish colossalai/nn/init.py code style (#1292)
2 years ago
HELSON abba4d84e1
[hotfix] fix bert model test in unitests (#1272)
2 years ago
oahzxl 0cf8e8e91c
[NFC] polish <colossalai/nn/lr_scheduler/poly.py> code style (#1267)
2 years ago
Jiarui Fang 1aad903c15
[tensor] redistribute among different process groups (#1247)
2 years ago
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245)
2 years ago
Jiarui Fang 2699dfbbfd
[rename] convert_to_dist -> redistribute (#1243)
2 years ago
Jiarui Fang 4a76084dc9
[tensor] add zero_like colo op, important for Optimizer (#1236)
2 years ago
Jiarui Fang 3b500984b1
[tensor] fix some unittests (#1234)
2 years ago
HELSON 0453776def
[tensor] fix a assertion in colo_tensor cross_entropy (#1232)
2 years ago
HELSON 42ab36b762
[tensor] add unitest for colo_tensor 1DTP cross_entropy (#1230)
2 years ago
Yi Zhao 04537bf83e
[checkpoint]support generalized scheduler (#1222)
2 years ago
Jiarui Fang a98319f023
[tensor] torch function return colotensor (#1229)
2 years ago
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203)
2 years ago
Jiarui Fang b5f25eb32a
[Tensor] add cpu group to ddp (#1200)
2 years ago
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189)
2 years ago