Tongping Liu
ab54fed292
[hotfix] add kwargs for colo_addmm ( #2171 )
2022-12-22 13:25:30 +08:00
アマデウス
622f863291
[hotfix] Jit type hint #2161 ( #2164 )
2022-12-22 10:17:03 +08:00
Jiarui Fang
2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. ( #2151 )
2022-12-20 10:19:36 +08:00
Jiarui Fang
bdef9dfdbe
[NFC] remove useless graph node code ( #2150 )
2022-12-20 00:33:58 +08:00
Jiarui Fang
9214d1fe28
[Gemini] chunk init using runtime visited param order ( #2115 )
2022-12-12 18:06:16 +08:00
HELSON
e7d3afc9cc
[optimizer] add div_scale for optimizers ( #2117 )
...
* [optimizer] add div_scale for optimizers
* [zero] use div_scale in zero optimizer
* fix testing error
2022-12-12 17:58:57 +08:00
Jiarui Fang
e5aa8333e4
[NFC] update chunk manager API ( #2119 )
2022-12-12 16:57:22 +08:00
Jiarui Fang
e99edfcb51
[NFC] polish comments for Chunk class ( #2116 )
2022-12-12 15:39:31 +08:00
HELSON
63fbba3c19
[zero] add L2 gradient clipping for ZeRO ( #2112 )
...
* [zero] add L2 gradient clipping
* [testing] add MlpModel
* [zero] add unit test for grad clipping
* fix atol
2022-12-09 18:09:17 +08:00
Jiarui Fang
1f99205827
[Gemini] remove static tracer ( #2083 )
2022-12-06 12:53:58 +08:00
Jiarui Fang
b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook ( #2080 )
2022-12-05 17:11:06 +08:00
HELSON
e37f3db40c
[gemini] add arguments ( #2046 )
...
* [zero] fix testing parameters
* [gemini] add arguments
* add docstrings
2022-11-30 16:40:13 +08:00
Jiarui Fang
96134e7be3
[hotfix] add bert test for gemini fwd bwd ( #2035 )
2022-11-29 11:19:52 +08:00
Jiarui Fang
8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor ( #2003 )
2022-11-25 20:06:35 +08:00
Jiarui Fang
a2d3266648
[hotfix] make Gemini work for conv DNN ( #1998 )
2022-11-22 14:52:36 +08:00
Jiarui Fang
cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook ( #1972 )
2022-11-17 14:43:49 +08:00
ver217
f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` ( #1971 )
2022-11-17 13:42:33 +08:00
Jiarui Fang
f7e276fa71
[Gemini] add GeminiAdamOptimizer ( #1960 )
2022-11-16 14:44:28 +08:00
アマデウス
e52f9d9109
[tensorparallel] fixed tp layers ( #1938 )
2022-11-14 17:34:03 +08:00
Jiarui Fang
986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 ( #1876 )
2022-11-10 17:36:42 +08:00
Jiarui Fang
c2947dadf1
[inference] streaming Linear 1D Row inference ( #1874 )
2022-11-10 17:03:21 +08:00
zbian
653b0a620e
added skip_bias_add for non-tp linear
2022-11-09 15:41:08 +08:00
アマデウス
4268ae017b
[kernel] added jit warmup ( #1792 )
2022-11-08 16:22:23 +08:00
Jiarui Fang
cd5a0d56fa
[Gemini] make gemini usage simple ( #1821 )
2022-11-08 15:53:13 +08:00
Zihao
20e255d4e8
MemStatsCollectorStatic ( #1765 )
2022-11-07 16:49:03 +08:00
HELSON
c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 ( #1786 )
...
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12
* [zero] add cpu shard init
* [zero] add tiny example test
* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
kurisusnowdeng
0b8161fab8
updated tp layers
2022-11-02 12:19:38 +08:00
Sze-qq
23703c9dd6
[NFC] polish colossalai/nn/metric/_utils.py code style ( #1727 )
2022-10-19 12:20:51 +08:00
Ofey Chan
7e62af28a0
[NFC] polish accuracy_2d.py code style ( #1719 )
2022-10-19 12:20:51 +08:00
yuxuan-lou
2b49ca80a3
[NFC] polish colossalai/nn/lr_scheduler/linear.py code style ( #1716 )
2022-10-19 12:20:51 +08:00
shenggan
e1d780030d
[NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style ( #1714 )
2022-10-19 12:20:51 +08:00
HELSON
1468e4bcfc
[zero] add constant placement policy ( #1705 )
...
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
binmakeswell
5f41463a76
add optimizer README for tutorials ( #1707 )
2022-10-14 09:10:18 +00:00
Jiarui Fang
21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding ( #1699 )
2022-10-13 22:22:27 +08:00
Jiarui Fang
363fc2861a
[embeddings] more detailed timer ( #1692 )
2022-10-12 12:01:21 +08:00
jim
e5ab6be72e
[hotfix[ fix colotensor.type() raise NotImplementedError ( #1682 )
2022-10-10 10:13:31 +08:00
HELSON
b28991dd0a
[feature] A new ZeRO implementation ( #1644 )
2022-10-09 09:18:51 +08:00
Jiarui Fang
c638bec028
[embedding] polish async copy ( #1657 )
2022-09-27 14:37:03 +08:00
Jiarui Fang
988570e4a6
[embedding] add more detail profiling ( #1656 )
2022-09-27 13:43:59 +08:00
Jiarui Fang
e1f97fd2b8
[embedding] print profiling results ( #1654 )
2022-09-27 12:50:33 +08:00
Jiarui Fang
04443605a5
[embedding] non-blocking cpu-gpu copy ( #1647 )
2022-09-26 14:57:57 +08:00
CsRic
0767f67a0f
[embedding] isolate cache_op from forward ( #1645 )
...
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2022-09-26 10:06:03 +08:00
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2022-09-24 19:58:18 +08:00
Jiarui Fang
e57df80325
[embeddings] cache option ( #1635 )
2022-09-23 16:40:18 +08:00
HELSON
a088022efc
[moe] fix moe bugs ( #1633 )
2022-09-23 15:33:57 +08:00
HELSON
f7f2248771
[moe] fix MoE bugs ( #1628 )
...
* remove forced FP32 modules
* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang
38c68b5b9a
[embedding] rollback for better FAW performance ( #1625 )
2022-09-22 11:16:25 +08:00
Jiarui Fang
504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num ( #1611 )
2022-09-20 14:33:04 +08:00
Jiarui Fang
a19eb80998
[embedding] updates some default parameters
2022-09-15 15:45:17 +08:00
CsRic
f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode ( #1584 )
2022-09-13 10:50:34 +08:00
Sze-qq
2144cbae8c
[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style ( #1572 )
2022-09-08 22:11:04 +08:00
superhao1995
e4bf7ae667
[NFC] polish colossalai/nn/lr_scheduler/torch.py code style ( #1571 )
...
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-09-08 22:11:04 +08:00
Jiatong Han
3263cdf57f
[NFC] polish colossalai/nn/parallel/data_parallel.py code style ( #1570 )
...
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
DouJS
f586887a90
[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style ( #1568 )
2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing
0c4c9aa6e0
[NFC] polish colossalai/nn/_ops/embedding.py code style ( #1561 )
2022-09-08 22:11:04 +08:00
Ofey Chan
7cc052f6c0
[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py ( #1556 )
2022-09-08 22:11:04 +08:00
yuxuan-lou
413f9c19f4
[NFC] polish colossalai/nn/_ops/layernorm.py code style ( #1555 )
2022-09-08 22:11:04 +08:00
shenggan
8edb777cc2
[NFC] polish colossalai/nn/loss/loss_2p5d.py code style ( #1553 )
2022-09-08 22:11:04 +08:00
Maruyama_Aya
bd2d789832
[NFC] polish colossalai/nn/_ops/embedding_bag.py code style ( #1552 )
2022-09-08 22:11:04 +08:00
binmakeswell
73e9eb13b7
[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style
2022-09-08 22:11:04 +08:00
CsRic
a389ac4ec9
[embedding] cache_embedding small improvement ( #1564 )
2022-09-08 16:41:19 +08:00
ver217
10dd8226b1
add gather_output for VocabParallelClassifier1D ( #1569 )
2022-09-08 16:40:56 +08:00
ver217
ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint ( #1548 )
...
* refactor parallel layer
* broadcast rank0 model after load ckpt
2022-09-06 20:18:35 +08:00
Jiarui Fang
64169f3e8f
[embedding] polish parallel embedding tablewise ( #1545 )
2022-09-06 10:41:20 +08:00
CsRic
964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application ( #1537 )
2022-09-05 15:12:53 +08:00
Jiarui Fang
521078ffc9
[embedding] fix a bug in table wise sharding ( #1538 )
2022-09-02 15:48:35 +08:00
Jiarui Fang
87134524fd
[embedding] tablewise sharding polish ( #1535 )
2022-09-02 11:09:37 +08:00
CsRic
5156d5b4f8
[embedding] add tablewise sharding for FAW ( #1526 )
2022-09-01 17:55:41 +08:00
Jiarui Fang
4537d39df9
[doc] docstring for FreqAwareEmbeddingBag ( #1525 )
2022-08-31 13:52:30 +08:00
Jiarui Fang
9a9ef65313
[FAW] cpu caching operations ( #1520 )
2022-08-30 14:50:02 +08:00
Jiarui Fang
af5438caa2
[FAW] refactor reorder() for CachedParamMgr ( #1514 )
2022-08-29 14:22:07 +08:00
Jiarui Fang
9feee6d06b
[FAW] LFU initialize with dataset freq ( #1513 )
2022-08-29 12:52:53 +08:00
CsRic
1b8fee8e9c
[FAW] shrink freq_cnter size ( #1509 )
2022-08-29 11:44:55 +08:00
Jiarui Fang
ba61109b6c
[FAW] remove code related to chunk ( #1501 )
2022-08-26 14:23:30 +08:00
Jiarui Fang
d5085bb317
[FAW] add more docs and fix a warning ( #1500 )
2022-08-26 14:10:21 +08:00
CsRic
0ed2f46131
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats ( #1494 )
2022-08-26 11:24:12 +08:00
CsRic
b8d0e39eaf
[FAW] LFU cache for the FAW
2022-08-25 13:08:46 +08:00
Jiarui Fang
cde7b8a5b8
[FAW] init an LFU implementation for FAW ( #1488 )
2022-08-24 17:37:22 +08:00
Geng Zhang
0aad53c62b
[FCE] update interface for frequency statistics in FreqCacheEmbedding ( #1462 )
2022-08-23 17:38:24 +08:00
Jiarui Fang
a1476ea882
[NFC] polish doc style for ColoTensor ( #1457 )
2022-08-16 09:21:05 +08:00
ver217
367c615818
fix nvme docstring ( #1450 )
2022-08-12 18:01:02 +08:00
Geng Zhang
9f3eed66eb
[FAW] reorganize the inheritance struct of FreqCacheEmbedding ( #1448 )
2022-08-12 15:55:46 +08:00
Frank Lee
ae1b58cd16
[tensor] added linear implementation for the new sharding spec ( #1416 )
...
* [tensor] added linear implementation for the new sharding spec
* polish code
2022-08-12 11:33:09 +08:00
Jiarui Fang
30b4dd17c0
[FAW] export FAW in _ops ( #1438 )
2022-08-11 13:43:24 +08:00
Jiarui Fang
c9427a323f
hotfix #1434 ( #1437 )
2022-08-11 13:14:25 +08:00
Jiarui Fang
10b3df65c8
[FAW] move coloparam setting in test code. ( #1429 )
2022-08-10 14:31:53 +08:00
Jiarui Fang
cb98cf5558
[FAW] parallel FreqAwareEmbedding ( #1424 )
2022-08-10 13:44:30 +08:00
Jiarui Fang
d209aff684
Add FreqAwareEmbeddingBag ( #1421 )
2022-08-09 16:26:12 +08:00
Jiarui Fang
504419d261
[FAW] add cache manager for the cached embedding ( #1419 )
2022-08-09 15:17:17 +08:00
ver217
12b4887097
[hotfix] fix CPUAdam kernel nullptr ( #1410 )
2022-08-05 19:45:45 +08:00
ver217
04c9a86af8
[zero] ZeroDDP supports controlling outputs' dtype ( #1399 )
2022-08-02 17:49:11 +08:00
HELSON
4e98e938ce
[zero] alleviate memory usage in ZeRODDP state_dict ( #1398 )
2022-08-02 15:49:13 +08:00
HELSON
c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor ( #1388 )
2022-07-29 19:33:24 +08:00
ver217
83328329dd
[hotfix] fix zero ddp buffer cast ( #1376 )
...
* fix zero ddp buffer cast
* fix zero ddp ignore params
2022-07-28 10:54:44 +08:00
ver217
5d5031e946
fix zero ddp state dict ( #1378 )
2022-07-28 09:31:42 +08:00
ver217
c415240db6
[nvme] CPUAdam and HybridAdam support NVMe offload ( #1360 )
...
* impl nvme optimizer
* update cpu adam
* add unit test
* update hybrid adam
* update docstr
* add TODOs
* update CI
* fix CI
* fix CI
* fix CI path
* fix CI path
* fix CI path
* fix install tensornvme
* fix CI
* fix CI path
* fix CI env variables
* test CI
* test CI
* fix CI
* fix nvme optim __del__
* fix adam __del__
* fix nvme optim
* fix CI env variables
* fix nvme optim import
* test CI
* test CI
* fix CI
2022-07-26 17:25:24 +08:00
HELSON
87775a0682
[colotensor] use cpu memory to store state_dict ( #1367 )
2022-07-26 14:13:38 +08:00
ver217
d068af81a3
[doc] update rst and docstring ( #1351 )
...
* update rst
* add zero docstr
* fix docstr
* remove fx.tracer.meta_patch
* fix docstr
* fix docstr
* update fx rst
* fix fx docstr
* remove useless rst
2022-07-21 15:54:53 +08:00
HELSON
7a8702c06d
[colotensor] add Tensor.view op and its unit test ( #1343 )
...
[colotensor] add megatron initialization for gpt2
2022-07-21 10:53:15 +08:00