Commit Graph

216 Commits (055fbf5be680dfde20be1c51302f3c8b154a93e4)

Author SHA1 Message Date
HELSON 055fbf5be6
[zero] adapt zero for unsharded paramters (Optimizer part) (#601)
3 years ago
KAIYUAN GAN 229382c844
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cuda_util.cu code stype (#625)
3 years ago
アマデウス 28b515d610
[model checkpoint] updated checkpoint hook (#598)
3 years ago
アマデウス 77ad24bf94
[model checkpoint] updated saving/loading for 3d layers (#597)
3 years ago
アマデウス 93089ed708
[model checkpoint] updated saving/loading for 2.5d layers (#596)
3 years ago
アマデウス 6302069c0e
[model checkpoint] updated communication ops for cpu tensors (#590)
3 years ago
アマデウス c50bfb807b
[model checkpoint] updated saving/loading for 1d layers (#594)
3 years ago
アマデウス 7636d518e1
[model checkpoint] updated saving/loading for 2d layers (#595)
3 years ago
アマデウス cd13b63832
[model checkpoint] reworked unified layers for ease of save/load states (#593)
3 years ago
アマデウス acae68eb04
[model checkpoint] updated checkpoint save/load utils (#592)
3 years ago
Ziyue Jiang 1c40ee8749
[TP] add assert for tp1d (#621)
3 years ago
ver217 369a288bf3
polish utils docstring (#620)
3 years ago
ver217 e619a651fb
polish optimizer docstring (#619)
3 years ago
ver217 8432dc7080
polish moe docsrting (#618)
3 years ago
ver217 c5b488edf8
polish amp docstring (#616)
3 years ago
ver217 0ef8819c67
polish docstring of zero (#612)
3 years ago
LuGY 02b187c14f
[zero] add sampling time for memstats collector (#610)
3 years ago
ver217 9bee119104
[hotfix] fix sharded optim zero grad (#604)
3 years ago
アマデウス 297b8baae2
[model checkpoint] add gloo groups for cpu tensor communication (#589)
3 years ago
アマデウス 54e688b623
moved ensure_path_exists to utils.common (#591)
3 years ago
Jiarui Fang e956d93ac2
[refactor] memory utils (#577)
3 years ago
ver217 104cbbb313
[hotfix] add hybrid adam to __init__ (#584)
3 years ago
HELSON e6d50ec107
[zero] adapt zero for unsharded parameters (#561)
3 years ago
Wesley 46c9ba33da update code format
3 years ago
Wesley 666cfd094a fix parallel_input flag for Linear1D_Col gather_output
3 years ago
ver217 7c6c427db1
[zero] trace states of fp16/32 grad and fp32 param (#571)
3 years ago
Jiarui Fang 7675366fce
[polish] rename col_attr -> colo_attr (#558)
3 years ago
Liang Bowen 2c45efc398
html refactor (#555)
3 years ago
Jiarui Fang d1211148a7
[utils] update colo tensor moving APIs (#553)
3 years ago
LuGY c44d797072
[docs] updatad docs of hybrid adam and cpu adam (#552)
3 years ago
ver217 014bac0c49
[zero] hijack p.grad in sharded model (#554)
3 years ago
Jiarui Fang f552b11294
[zero] label state for param fp16 and grad (#551)
3 years ago
Jiarui Fang 214da761d4
[zero] add stateful tensor (#549)
3 years ago
Jiarui Fang 107b99ddb1
[zero] dump memory stats for sharded model (#548)
3 years ago
Ziyue Jiang 763dc325f1
[TP] Add gather_out arg to Linear (#541)
3 years ago
HELSON 8c90d4df54
[zero] add zero context manager to change config during initialization (#546)
3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style
3 years ago
Jiarui Fang 53b1b6e340
[zero] non model data tracing (#545)
3 years ago
Jie Zhu 73d36618a6
[profiler] add MemProfiler (#356)
3 years ago
ver217 fb841dd5c5
[zero] optimize grad offload (#539)
3 years ago
Jiarui Fang 7d81b5b46e
[logging] polish logger format (#543)
3 years ago
ver217 1f90a3b129
[zero] polish ZeroInitContext (#540)
3 years ago
Jiarui Fang c11ff81b15
[zero] get memory usage of sharded optim v2. (#542)
3 years ago
HELSON a30e2b4c24
[zero] adapt for no-leaf module in zero (#535)
3 years ago
Jiarui Fang 705f56107c
[zero] refactor model data tracing (#537)
3 years ago
Jiarui Fang a590ed0ba3
[zero] improve the accuracy of get_memory_usage of sharded param (#538)
3 years ago
Jiarui Fang 37cb70feec
[zero] get memory usage for sharded param (#536)
3 years ago
Jiarui Fang 05e33b2578
[zero] fix grad offload (#528)
3 years ago
LuGY 105c5301c3
[zero]added hybrid adam, removed loss scale in adam (#527)
3 years ago
Jiarui Fang 8d8c5407c0
[zero] refactor model data tracing (#522)
3 years ago