ColossalAI

Commit Graph

Author	SHA1	Message	Date
HELSON	dddacd2d2c	[hotfix] add norm clearing for the overflow step (#2416 )	2 years ago
HELSON	ea13a201bb	[polish] polish code for get_static_torch_model (#2405 ) * [gemini] polish code * [testing] remove code * [gemini] make more robust	2 years ago
Frank Lee	551cafec14	[doc] updated kernel-related optimisers' docstring (#2385 ) * [doc] updated kernel-related optimisers' docstring * polish doc	2 years ago
eric8607242	9880fd2cd8	Fix state_dict key missing issue of the ZeroDDP (#2363 ) * Fix state_dict output for ZeroDDP duplicated parameters * Rewrite state_dict based on get_static_torch_model * Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)	2 years ago
Frank Lee	40d376c566	[setup] support pre-build and jit-build of cuda kernels (#2374 ) * [setup] support pre-build and jit-build of cuda kernels * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
HELSON	48d33b1b17	[gemini] add get static torch model (#2356 )	2 years ago
Jiarui Fang	16cc8e6aa7	[builder] MOE builder (#2277 )	2 years ago
zbian	e94c79f15b	improved allgather & reducescatter for 3d	2 years ago
Jiarui Fang	af32022f74	[Gemini] fix the convert_to_torch_module bug (#2269 )	2 years ago
HELSON	2458659919	[zero] fix error for BEiT models (#2169 ) * [zero] fix error for BEiT models * [ColoParameter] add unpack operation for tuple arguments * fix bugs * fix chunkv2 unit testing * add assertion for gradient state	2 years ago
Jiarui Fang	355ffb386e	[builder] unified cpu_optim fused_optim inferface (#2190 )	2 years ago
Jiarui Fang	9587b080ba	[builder] use runtime builder for fused_optim (#2189 )	2 years ago
Jiarui Fang	d42afd30f8	[builder] runtime adam and fused_optim builder (#2184 )	2 years ago
Tongping Liu	ab54fed292	[hotfix] add kwargs for colo_addmm (#2171 )	2 years ago
アマデウス	622f863291	[hotfix] Jit type hint #2161 (#2164 )	2 years ago
Jiarui Fang	2827f41898	[Gemini] GeminiDPP convert to PyTorch Module. (#2151 )	2 years ago
Jiarui Fang	bdef9dfdbe	[NFC] remove useless graph node code (#2150 )	2 years ago
Jiarui Fang	9214d1fe28	[Gemini] chunk init using runtime visited param order (#2115 )	2 years ago
HELSON	e7d3afc9cc	[optimizer] add div_scale for optimizers (#2117 ) * [optimizer] add div_scale for optimizers * [zero] use div_scale in zero optimizer * fix testing error	2 years ago
Jiarui Fang	e5aa8333e4	[NFC] update chunk manager API (#2119 )	2 years ago
Jiarui Fang	e99edfcb51	[NFC] polish comments for Chunk class (#2116 )	2 years ago
HELSON	63fbba3c19	[zero] add L2 gradient clipping for ZeRO (#2112 ) * [zero] add L2 gradient clipping * [testing] add MlpModel * [zero] add unit test for grad clipping * fix atol	2 years ago
Jiarui Fang	1f99205827	[Gemini] remove static tracer (#2083 )	2 years ago
Jiarui Fang	b3b89865e2	[Gemini] ParamOpHook -> ColoParamOpHook (#2080 )	2 years ago
HELSON	e37f3db40c	[gemini] add arguments (#2046 ) * [zero] fix testing parameters * [gemini] add arguments * add docstrings	2 years ago
Jiarui Fang	96134e7be3	[hotfix] add bert test for gemini fwd bwd (#2035 )	2 years ago
Jiarui Fang	8daf1b4db1	[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003 )	2 years ago
Jiarui Fang	a2d3266648	[hotfix] make Gemini work for conv DNN (#1998 )	2 years ago
Jiarui Fang	cc0ed7cf33	[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972 )	2 years ago
ver217	f8a7148dec	[kernel] move all symlinks of kernel to `colossalai._C` (#1971 )	2 years ago
Jiarui Fang	f7e276fa71	[Gemini] add GeminiAdamOptimizer (#1960 )	2 years ago
アマデウス	e52f9d9109	[tensorparallel] fixed tp layers (#1938 )	2 years ago
Jiarui Fang	986f8cbaa7	[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876 )	2 years ago
Jiarui Fang	c2947dadf1	[inference] streaming Linear 1D Row inference (#1874 )	2 years ago
zbian	653b0a620e	added skip_bias_add for non-tp linear	2 years ago
アマデウス	4268ae017b	[kernel] added jit warmup (#1792 )	2 years ago
Jiarui Fang	cd5a0d56fa	[Gemini] make gemini usage simple (#1821 )	2 years ago
Zihao	20e255d4e8	MemStatsCollectorStatic (#1765 )	2 years ago
HELSON	c6a1a62636	[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786 ) * [hotfix] fix zero's incompatibility with checkpoint in torch-1.12 * [zero] add cpu shard init * [zero] add tiny example test * [colo_tensor] fix bugs for torch-1.11	2 years ago
kurisusnowdeng	0b8161fab8	updated tp layers	2 years ago
Sze-qq	23703c9dd6	[NFC] polish colossalai/nn/metric/_utils.py code style (#1727 )	2 years ago
Ofey Chan	7e62af28a0	[NFC] polish accuracy_2d.py code style (#1719 )	2 years ago
yuxuan-lou	2b49ca80a3	[NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716 )	2 years ago
shenggan	e1d780030d	[NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714 )	2 years ago
HELSON	1468e4bcfc	[zero] add constant placement policy (#1705 ) * fixes memory leak when paramter is in fp16 in ZeroDDP init. * bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release. * adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.	2 years ago
binmakeswell	5f41463a76	add optimizer README for tutorials (#1707 )	2 years ago
Jiarui Fang	21962e1593	[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699 )	2 years ago
Jiarui Fang	363fc2861a	[embeddings] more detailed timer (#1692 )	2 years ago
jim	e5ab6be72e	[hotfix[ fix colotensor.type() raise NotImplementedError (#1682 )	2 years ago
HELSON	b28991dd0a	[feature] A new ZeRO implementation (#1644 )	2 years ago
Jiarui Fang	c638bec028	[embedding] polish async copy (#1657 )	2 years ago
Jiarui Fang	988570e4a6	[embedding] add more detail profiling (#1656 )	2 years ago
Jiarui Fang	e1f97fd2b8	[embedding] print profiling results (#1654 )	2 years ago
Jiarui Fang	04443605a5	[embedding] non-blocking cpu-gpu copy (#1647 )	2 years ago
CsRic	0767f67a0f	[embedding] isolate cache_op from forward (#1645 ) Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2 years ago
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
Jiarui Fang	e57df80325	[embeddings] cache option (#1635 )	2 years ago
HELSON	a088022efc	[moe] fix moe bugs (#1633 )	2 years ago
HELSON	f7f2248771	[moe] fix MoE bugs (#1628 ) * remove forced FP32 modules * correct no_shard-contexts' positions	2 years ago
Jiarui Fang	38c68b5b9a	[embedding] rollback for better FAW performance (#1625 )	2 years ago
Jiarui Fang	504ff1d101	[embeddings] use cache_ratio instead of cuda_row_num (#1611 )	2 years ago
Jiarui Fang	a19eb80998	[embedding] updates some default parameters	2 years ago
CsRic	f3403ff98e	[embeddings] add already_split_along_rank flag for tablewise mode (#1584 )	2 years ago
Sze-qq	2144cbae8c	[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572 )	2 years ago
superhao1995	e4bf7ae667	[NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571 ) Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>	2 years ago
Jiatong Han	3263cdf57f	[NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570 ) Co-authored-by: JThh <jiatong.han@u.nus.edu>	2 years ago
DouJS	f586887a90	[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568 )	2 years ago
BigOneLiXiaoMing	0c4c9aa6e0	[NFC] polish colossalai/nn/_ops/embedding.py code style (#1561 )	2 years ago
Ofey Chan	7cc052f6c0	[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556 )	2 years ago
yuxuan-lou	413f9c19f4	[NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555 )	2 years ago
shenggan	8edb777cc2	[NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553 )	2 years ago
Maruyama_Aya	bd2d789832	[NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552 )	2 years ago
binmakeswell	73e9eb13b7	[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style	2 years ago
CsRic	a389ac4ec9	[embedding] cache_embedding small improvement (#1564 )	2 years ago
ver217	10dd8226b1	add gather_output for VocabParallelClassifier1D (#1569 )	2 years ago
ver217	ae71036cd2	[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548 ) * refactor parallel layer * broadcast rank0 model after load ckpt	2 years ago
Jiarui Fang	64169f3e8f	[embedding] polish parallel embedding tablewise (#1545 )	2 years ago
CsRic	964123ae0f	[embedding] freq_aware_embedding: add small functions for caller application (#1537 )	2 years ago
Jiarui Fang	521078ffc9	[embedding] fix a bug in table wise sharding (#1538 )	2 years ago
Jiarui Fang	87134524fd	[embedding] tablewise sharding polish (#1535 )	2 years ago
CsRic	5156d5b4f8	[embedding] add tablewise sharding for FAW (#1526 )	2 years ago
Jiarui Fang	4537d39df9	[doc] docstring for FreqAwareEmbeddingBag (#1525 )	2 years ago
Jiarui Fang	9a9ef65313	[FAW] cpu caching operations (#1520 )	2 years ago
Jiarui Fang	af5438caa2	[FAW] refactor reorder() for CachedParamMgr (#1514 )	2 years ago
Jiarui Fang	9feee6d06b	[FAW] LFU initialize with dataset freq (#1513 )	2 years ago
CsRic	1b8fee8e9c	[FAW] shrink freq_cnter size (#1509 )	2 years ago
Jiarui Fang	ba61109b6c	[FAW] remove code related to chunk (#1501 )	2 years ago
Jiarui Fang	d5085bb317	[FAW] add more docs and fix a warning (#1500 )	2 years ago
CsRic	0ed2f46131	[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494 )	2 years ago
CsRic	b8d0e39eaf	[FAW] LFU cache for the FAW	2 years ago
Jiarui Fang	cde7b8a5b8	[FAW] init an LFU implementation for FAW (#1488 )	2 years ago
Geng Zhang	0aad53c62b	[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462 )	2 years ago
Jiarui Fang	a1476ea882	[NFC] polish doc style for ColoTensor (#1457 )	2 years ago
ver217	367c615818	fix nvme docstring (#1450 )	2 years ago
Geng Zhang	9f3eed66eb	[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448 )	2 years ago
Frank Lee	ae1b58cd16	[tensor] added linear implementation for the new sharding spec (#1416 ) * [tensor] added linear implementation for the new sharding spec * polish code	2 years ago
Jiarui Fang	30b4dd17c0	[FAW] export FAW in _ops (#1438 )	2 years ago
Jiarui Fang	c9427a323f	hotfix #1434 (#1437 )	2 years ago
Jiarui Fang	10b3df65c8	[FAW] move coloparam setting in test code. (#1429 )	2 years ago

1 2 3 4 5 ...

306 Commits (10e3c9f923caf4fb68ab61e96c244bd5cca9b9da)