50 Commits (8241c0c054b38a109ed3ce7be1052a1e600b8471)

Author SHA1 Message Date
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
Kai Lv 0adca5b688
[launch] Support IPv4 host initialization in launch (#5822) 5 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743) 1 year ago
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603) 1 year ago
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560) 1 year ago
digger yu de0d7df33f
[nfc] fix typo colossalai/zero (#3923) 1 year ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424) 2 years ago
Haofan Wang 9358262992
Fix False warning in initialize.py (#2456) 2 years ago
Jiarui Fang 4165eabb1e
[hotfix] remove potiential circle import (#1307) 2 years ago
Frank Lee 91a5999825
[ddp] supported customized torch ddp configuration (#1123) 2 years ago
Frank Lee 3d10be33bd
[cudnn] set False to cudnn benchmark by default (#1063) 3 years ago
Frank Lee 7a64fae33a
[doc] improved error messages in initialize (#872) 3 years ago
LuGY c1e8d2001e
modefied the pp build for ckpt adaptation (#803) 3 years ago
Jiarui Fang 227d1cd4b3
[gemini] APIs to set cpu memory capacity (#809) 3 years ago
Jiarui Fang 681addb512
[refactor] moving grad acc logic to engine (#804) 3 years ago
ver217 097772546e fix initialize about zero 3 years ago
Frank Lee 04ff5ea546
[utils] support detection of number of processes on current node (#723) 3 years ago
YuliangLiu0306 0ed7042f42
[pipeline] refactor pipeline (#679) 3 years ago
YuliangLiu0306 ade05a5d83
[refactor] pipeline, put runtime schedule into engine. (#627) 3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style 3 years ago
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500) 3 years ago
HELSON 7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469) 3 years ago
ver217 a241f61b34
[zero] Update initialize for ZeRO (#458) 3 years ago
ver217 642846d6f9
update sharded optim and fix zero init ctx (#457) 3 years ago
Jiarui Fang e2e9f82588
Revert "[zero] update sharded optim and fix zero init ctx" (#456) 3 years ago
ver217 57567ee768 update sharded optim and fix zero init ctx 3 years ago
Jiarui Fang 496cbb0760
[hotfix] fix initialize bug with zero (#442) 3 years ago
Jiarui Fang 640a6cd304
[refactory] refactory the initialize method for new zero design (#431) 3 years ago
Frank Lee e79ea44247
[fp16] refactored fp16 optimizer (#392) 3 years ago
Frank Lee 6a3188167c set criterion as optional in colossalai initialize (#336) 3 years ago
Frank Lee e17e54e32a added buffer sync to naive amp model wrapper (#291) 3 years ago
Jie Zhu f867365aba bug fix: pass hook_list to engine (#273) 3 years ago
Jiarui Fang 5a560a060a Feature/zero (#279) 3 years ago
Frank Lee 765db512b5
fixed ddp bug on torch 1.8 (#194) 3 years ago
HELSON 0f8c7f9804
Fixed docstring in colossalai (#171) 3 years ago
Frank Lee e2089c5c15
adapted for sequence parallel (#163) 3 years ago
HELSON dceae85195
Added MoE parallel (#127) 3 years ago
ver217 293fb40c42
add scatter/gather optim for pipeline (#123) 3 years ago
ver217 96780e6ee4
Optimize pipeline schedule (#94) 3 years ago
アマデウス 0fedef4f3c
Layer integration (#83) 3 years ago
ver217 8f02a88db2
add interleaved pipeline, fix naive amp and update pipeline model initializer (#80) 3 years ago
Frank Lee 35813ed3c4
update examples and sphnix docs for the new api (#63) 3 years ago
ver217 7d3711058f
fix zero3 fp16 and add zero3 model context (#62) 3 years ago
Frank Lee da01c234e1
Develop/experiments (#59) 3 years ago
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27) 3 years ago
zbian 404ecbdcc6 Migrated project 3 years ago