Commit Graph

30 Commits (07cb21142fc1daaf1a402f827721d3fdeb56d075)

Author SHA1 Message Date
Jiarui Fang 8c66a1d0aa
[polish] remove useless file _mem_tracer_hook.py (#1963)
2 years ago
LuGY 730f88f8e1 [NFC] polish _checkpoint_hook.py code style (#1722)
2 years ago
Boyuan Yao 20e466527b [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576)
2 years ago
Jiarui Fang bcab249565
fix issue #1080 (#1071)
3 years ago
Frank Lee 1c34382678
[doc] improved assertion messages in trainer (#873)
3 years ago
Jiarui Fang 61c20b44bc
[log] local throughput metrics (#811)
3 years ago
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801)
3 years ago
LuGY 80e37eec42
fix the ckpt bugs when using DDP (#769)
3 years ago
HELSON 340e59f968
[utils] add synchronized cuda memory monitor (#740)
3 years ago
YuliangLiu0306 0ed7042f42
[pipeline] refactor pipeline (#679)
3 years ago
YuliangLiu0306 ade05a5d83
[refactor] pipeline, put runtime schedule into engine. (#627)
3 years ago
アマデウス 28b515d610
[model checkpoint] updated checkpoint hook (#598)
3 years ago
Liang Bowen 2c45efc398
html refactor (#555)
3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style
3 years ago
Jie Zhu 73d36618a6
[profiler] add MemProfiler (#356)
3 years ago
1SAA 73bff11288 Added profiler communication operations
3 years ago
アマデウス 9ee197d0e9 moved env variables to global variables; (#215)
3 years ago
Jiarui Fang 569357fea0
add pytorch hooks (#179)
3 years ago
HELSON 0f8c7f9804
Fixed docstring in colossalai (#171)
3 years ago
BoxiangW 4a3d3446b0
Update layer integration documentations (#108)
3 years ago
Jiarui Fang 2c0c85d3d3
fix a bug in timer (#114)
3 years ago
ver217 7904baf6e1
fix layers/schedule for hybrid parallelization (#111) (#112)
3 years ago
ver217 96780e6ee4
Optimize pipeline schedule (#94)
3 years ago
アマデウス 01a80cd86d
Hotfix/Colossalai layers (#92)
3 years ago
アマデウス 0fedef4f3c
Layer integration (#83)
3 years ago
Frank Lee cd9c28e055
added CI for unit testing (#69)
3 years ago
Frank Lee 9a0466534c
update markdown docs (english) (#60)
3 years ago
Frank Lee da01c234e1
Develop/experiments (#59)
3 years ago
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27)
3 years ago
zbian 404ecbdcc6 Migrated project
3 years ago