Commit Graph

71 Commits (0653c63eaacd1504f5d66f2e11f80defdb155832)

Author SHA1 Message Date
YuliangLiu0306 b167258b6a
[pipeline]refactor ppschedule to support tensor list (#1050)
3 years ago
Frank Lee e4685832f8
[engine] fixed bug in gradient accumulation dataloader to keep the last step (#1030)
3 years ago
YuliangLiu0306 32a45cd7ef
[pipelinable]use pipelinable to support GPT model. (#903)
3 years ago
Frank Lee 11f54c7b6b
[doc] improved docstring and assertion messages for the engine module (#871)
3 years ago
Jiarui Fang 681addb512
[refactor] moving grad acc logic to engine (#804)
3 years ago
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801)
3 years ago
HELSON 84c6700b2a
[zero] refactor memstats_collector (#746)
3 years ago
Jiarui Fang 4d90a7b513
[refactor] zero directory (#724)
3 years ago
Jiarui Fang 193dc8dacb
[refactor] refactor the memory utils (#715)
3 years ago
HELSON ee112fe1da
[zero] adapt zero hooks for unsharded module (#699)
3 years ago
ver217 3c9cd5bb5e
[zero] stateful tensor manager (#687)
3 years ago
YuliangLiu0306 0ed7042f42
[pipeline] refactor pipeline (#679)
3 years ago
RichardoLuo ad1e7ab2b2 '[NFC] polish <colossalai/engine/_base_engine.py> code style' (#631)
3 years ago
doubleHU f2da21a827 fix format (#586)
3 years ago
fanjinfucool ffad81e1d1 fix format (#585)
3 years ago
Maruyama_Aya d2dc6049b5 fix format (#580)
3 years ago
yuxuan-lou cfb41297ff 'fix/format' (#573)
3 years ago
YuliangLiu0306 ade05a5d83
[refactor] pipeline, put runtime schedule into engine. (#627)
3 years ago
Jiarui Fang e956d93ac2
[refactor] memory utils (#577)
3 years ago
HELSON e6d50ec107
[zero] adapt zero for unsharded parameters (#561)
3 years ago
Jiarui Fang 7675366fce
[polish] rename col_attr -> colo_attr (#558)
3 years ago
ver217 014bac0c49
[zero] hijack p.grad in sharded model (#554)
3 years ago
Jiarui Fang f552b11294
[zero] label state for param fp16 and grad (#551)
3 years ago
Jiarui Fang 214da761d4
[zero] add stateful tensor (#549)
3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style
3 years ago
Jie Zhu 73d36618a6
[profiler] add MemProfiler (#356)
3 years ago
HELSON a30e2b4c24
[zero] adapt for no-leaf module in zero (#535)
3 years ago
Jiarui Fang 705f56107c
[zero] refactor model data tracing (#537)
3 years ago
Jiarui Fang 4d322b79da
[refactor] remove old zero code (#517)
3 years ago
Jiarui Fang 920c5889a7
[zero] add colo move inline (#521)
3 years ago
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500)
3 years ago
Jiarui Fang b334822163
[zero] polish sharded param name (#484)
3 years ago
Jiarui Fang 65c0f380c2
[format] polish name format for MOE (#481)
3 years ago
ver217 8d3250d74b
[zero] ZeRO supports pipeline parallel (#477)
3 years ago
HELSON aff9d354f7
[MOE] polish moe_env (#467)
3 years ago
HELSON 84fd7c1d4d
add moe context, moe utilities and refactor gradient handler (#455)
3 years ago
ver217 a241f61b34
[zero] Update initialize for ZeRO (#458)
3 years ago
ver217 9506a8beb2 use double buffer to handle grad
3 years ago
Jiarui Fang 56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER (#417)
3 years ago
Jiarui Fang 21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system (#395)
3 years ago
ver217 88804aee49 add bucket tensor shard strategy
3 years ago
Xu Kai 54ee8d1254 Fix/format colossalai/engine/paramhooks/(#350)
3 years ago
yuxuan-lou 3b88eb2259 Flake8 code restyle
3 years ago
Jiarui Fang 44e4891f57 [zero] able to place params on cpu after zero init context (#365)
3 years ago
Jiarui Fang 10e2826426 move async memory to an individual directory (#345)
3 years ago
Frank Lee 6a3188167c set criterion as optional in colossalai initialize (#336)
3 years ago
Jie Zhu 3213554cc2 [profiler] add adaptive sampling to memory profiler (#330)
3 years ago
ver217 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323)
3 years ago
Jiarui Fang 11bddb6e55 [zero] update zero context init with the updated test utils (#327)
3 years ago
ver217 36f9a74ab2 fix sharded param hook and unit test
3 years ago