Commit Graph

95 Commits (d344313533de84ebd6876e0da86303218a954a4f)

Author SHA1 Message Date
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765)
2 years ago
Jiarui Fang c248800359
[kernel] skip tests of flash_attn and triton when they are not available (#1798)
2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
2 years ago
Jiarui Fang cb5a587e9a
[hotfix] polish chunk import (#1787)
2 years ago
Jiarui Fang f34dab4270
[compatibility] ChunkMgr import error (#1772)
2 years ago
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
Zangwei Zheng 9823cbf24b [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565)
2 years ago
Kai Wang (Victor Kai) 46931e3c32 [NFC] polish code colossalai/gemini/update/search_utils.py (#1557)
2 years ago
HELSON b80340168e
[zero] add chunk_managerV2 for all-gather chunk (#1441)
2 years ago
HELSON 9056677b13
[zero] add chunk size searching algorithm for parameters in different groups (#1436)
2 years ago
HELSON 039b7ed3bc
[polish] add update directory in gemini; rename AgChunk to ChunkV2 (#1432)
2 years ago
HELSON 0d212183c4
[zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426)
2 years ago
HELSON 4fb3c52cf0
[zero] add unit test for AgChunk's append, close, access (#1423)
2 years ago
HELSON c577ed016e
[zero] add AgChunk (#1417)
2 years ago
ver217 56b8863b87
[zero] chunk manager allows filtering ex-large params (#1393)
2 years ago
HELSON 527758b2ae
[hotfix] fix a running error in test_colo_checkpoint.py (#1387)
2 years ago
Jiarui Fang f792507ff3
[chunk] add PG check for tensor appending (#1383)
2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351)
2 years ago
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
2 years ago
Jiarui Fang 4165eabb1e
[hotfix] remove potiential circle import (#1307)
2 years ago
ver217 dba7e0cfb4
make AutoPlacementPolicy configurable (#1191)
2 years ago
Jiarui Fang 372f791444
[refactor] move chunk and chunkmgr to directory gemini (#1182)
2 years ago
ver217 54aabb8da4
[gemini] refactor gemini mgr (#1151)
2 years ago
ver217 7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy (#1118)
2 years ago
Frank Lee 14e5b11d7f
[zero] fixed api consistency (#1098)
2 years ago
ver217 1f894e033f
[gemini] zero supports gemini (#1093)
2 years ago
ver217 be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077)
2 years ago
ver217 c4d903e64a
[gemini] accelerate adjust_layout() (#878)
3 years ago
HELSON 425b4a96b8
[gemini] polish stateful_tensor_mgr (#876)
3 years ago
HELSON 3107817172
[gemini] add stateful tensor container (#867)
3 years ago
HELSON f0e654558f
[gemini] polish code (#855)
3 years ago
ver217 d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data (#850)
3 years ago
ver217 0dea140760
[hotfix] add deconstructor for stateful tensor (#848)
3 years ago
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
3 years ago
Jiarui Fang 0ce8924ceb
[tensor] reorganize files (#820)
3 years ago
Jiarui Fang ab962b9735
[gemini] a new tensor structure (#818)
3 years ago
Jiarui Fang 3ddbd1bce1
[gemini] collect cpu-gpu moving volume in each iteration (#813)
3 years ago
Jiarui Fang 681addb512
[refactor] moving grad acc logic to engine (#804)
3 years ago
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801)
3 years ago
ver217 846406a07a
[gemini] fix auto tensor placement policy (#775)
3 years ago
Jiarui Fang 10ef8afdd2
[gemini] init genimi individual directory (#754)
3 years ago