30 Commits (7b13f7db18999c611db21a17b7388709af75eda1)

Author SHA1 Message Date
Jiarui Fang 8c66a1d0aa
[polish] remove useless file _mem_tracer_hook.py (#1963) 2 years ago
LuGY 730f88f8e1 [NFC] polish _checkpoint_hook.py code style (#1722) 2 years ago
Boyuan Yao 20e466527b [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576) 2 years ago
Jiarui Fang bcab249565
fix issue #1080 (#1071) 2 years ago
Frank Lee 1c34382678
[doc] improved assertion messages in trainer (#873) 3 years ago
Jiarui Fang 61c20b44bc
[log] local throughput metrics (#811) 3 years ago
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801) 3 years ago
LuGY 80e37eec42
fix the ckpt bugs when using DDP (#769) 3 years ago
HELSON 340e59f968
[utils] add synchronized cuda memory monitor (#740) 3 years ago
YuliangLiu0306 0ed7042f42
[pipeline] refactor pipeline (#679) 3 years ago
YuliangLiu0306 ade05a5d83
[refactor] pipeline, put runtime schedule into engine. (#627) 3 years ago
アマデウス 28b515d610
[model checkpoint] updated checkpoint hook (#598) 3 years ago
Liang Bowen 2c45efc398
html refactor (#555) 3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style 3 years ago
Jie Zhu 73d36618a6
[profiler] add MemProfiler (#356) 3 years ago
1SAA 73bff11288 Added profiler communication operations 3 years ago
アマデウス 9ee197d0e9 moved env variables to global variables; (#215) 3 years ago
Jiarui Fang 569357fea0
add pytorch hooks (#179) 3 years ago
HELSON 0f8c7f9804
Fixed docstring in colossalai (#171) 3 years ago
BoxiangW 4a3d3446b0
Update layer integration documentations (#108) 3 years ago
Jiarui Fang 2c0c85d3d3
fix a bug in timer (#114) 3 years ago
ver217 7904baf6e1
fix layers/schedule for hybrid parallelization (#111) (#112) 3 years ago
ver217 96780e6ee4
Optimize pipeline schedule (#94) 3 years ago
アマデウス 01a80cd86d
Hotfix/Colossalai layers (#92) 3 years ago
アマデウス 0fedef4f3c
Layer integration (#83) 3 years ago
Frank Lee cd9c28e055
added CI for unit testing (#69) 3 years ago
Frank Lee 9a0466534c
update markdown docs (english) (#60) 3 years ago
Frank Lee da01c234e1
Develop/experiments (#59) 3 years ago
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27) 3 years ago
zbian 404ecbdcc6 Migrated project 3 years ago