Commit Graph

70 Commits (e5b1a0c9bee8ba6cc1fe5af99afe725aec2b6509)

Author SHA1 Message Date
Jiarui Fang cd5a0d56fa
[Gemini] make gemini usage simple (#1821) 2022-11-08 15:53:13 +08:00
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765) 2022-11-07 16:49:03 +08:00
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) 2022-10-13 22:22:27 +08:00
Jiarui Fang 363fc2861a
[embeddings] more detailed timer (#1692) 2022-10-12 12:01:21 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Jiarui Fang c638bec028
[embedding] polish async copy (#1657) 2022-09-27 14:37:03 +08:00
Jiarui Fang 988570e4a6
[embedding] add more detail profiling (#1656) 2022-09-27 13:43:59 +08:00
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654) 2022-09-27 12:50:33 +08:00
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647) 2022-09-26 14:57:57 +08:00
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
Jiarui Fang e57df80325
[embeddings] cache option (#1635) 2022-09-23 16:40:18 +08:00
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
Jiarui Fang a19eb80998
[embedding] updates some default parameters 2022-09-15 15:45:17 +08:00
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2022-09-13 10:50:34 +08:00
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564) 2022-09-08 16:41:19 +08:00
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545) 2022-09-06 10:41:20 +08:00
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537) 2022-09-05 15:12:53 +08:00
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538) 2022-09-02 15:48:35 +08:00
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535) 2022-09-02 11:09:37 +08:00
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526) 2022-09-01 17:55:41 +08:00
Jiarui Fang 4537d39df9
[doc] docstring for FreqAwareEmbeddingBag (#1525) 2022-08-31 13:52:30 +08:00
Jiarui Fang 9a9ef65313
[FAW] cpu caching operations (#1520) 2022-08-30 14:50:02 +08:00
Jiarui Fang af5438caa2
[FAW] refactor reorder() for CachedParamMgr (#1514) 2022-08-29 14:22:07 +08:00
Jiarui Fang 9feee6d06b
[FAW] LFU initialize with dataset freq (#1513) 2022-08-29 12:52:53 +08:00
CsRic 1b8fee8e9c
[FAW] shrink freq_cnter size (#1509) 2022-08-29 11:44:55 +08:00
Jiarui Fang ba61109b6c
[FAW] remove code related to chunk (#1501) 2022-08-26 14:23:30 +08:00
Jiarui Fang d5085bb317
[FAW] add more docs and fix a warning (#1500) 2022-08-26 14:10:21 +08:00
CsRic 0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494) 2022-08-26 11:24:12 +08:00
CsRic b8d0e39eaf
[FAW] LFU cache for the FAW 2022-08-25 13:08:46 +08:00
Jiarui Fang cde7b8a5b8
[FAW] init an LFU implementation for FAW (#1488) 2022-08-24 17:37:22 +08:00
Geng Zhang 0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) 2022-08-23 17:38:24 +08:00
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457) 2022-08-16 09:21:05 +08:00
Geng Zhang 9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448) 2022-08-12 15:55:46 +08:00
Jiarui Fang 30b4dd17c0
[FAW] export FAW in _ops (#1438) 2022-08-11 13:43:24 +08:00
ver217 04c9a86af8
[zero] ZeroDDP supports controlling outputs' dtype (#1399) 2022-08-02 17:49:11 +08:00
HELSON 4e98e938ce
[zero] alleviate memory usage in ZeRODDP state_dict (#1398) 2022-08-02 15:49:13 +08:00
ver217 83328329dd
[hotfix] fix zero ddp buffer cast (#1376)
* fix zero ddp buffer cast

* fix zero ddp ignore params
2022-07-28 10:54:44 +08:00
ver217 5d5031e946
fix zero ddp state dict (#1378) 2022-07-28 09:31:42 +08:00
HELSON 87775a0682
[colotensor] use cpu memory to store state_dict (#1367) 2022-07-26 14:13:38 +08:00
ver217 d068af81a3
[doc] update rst and docstring (#1351)
* update rst

* add zero docstr

* fix docstr

* remove fx.tracer.meta_patch

* fix docstr

* fix docstr

* update fx rst

* fix fx docstr

* remove useless rst
2022-07-21 15:54:53 +08:00
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
* process group supports getting ranks in group

* chunk mgr receives a process group

* update unit test

* fix unit tests
2022-07-18 14:14:52 +08:00
HELSON 1b41686461
[hotfix] fix unit test test_module_spec (#1321) 2022-07-15 14:02:32 +08:00
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245) 2022-07-11 15:51:48 +08:00
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2022-07-06 16:15:16 +08:00