Commit Graph

78 Commits (0ccb8c614114bb7497ab4a815ab24cbdc5836f23)

Author SHA1 Message Date
ver217 9492a561c3
[tensor] ColoTensor supports ZeRo (#1015)
3 years ago
ver217 cefc29ff06
[tensor] impl ColoDDP for ColoTensor (#1009)
3 years ago
Ziheng Qin 571f12eff3 [NFC] polish colossalai/nn/layer/utils/common.py code style (#983)
3 years ago
shenggan 18542b47fc [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976)
3 years ago
Zirui Zhu 598cde4a0f [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972)
3 years ago
LuGY fb5bc6cb28 [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966)
3 years ago
ver217 58580b50fe
Revert "[NFC] Hotfix/format (#984)" (#986)
3 years ago
binmakeswell 0772828fba
[NFC] Hotfix/format (#984)
3 years ago
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
3 years ago
Ziyue Jiang 4b01da24cd
[TP] change the check assert in split batch 2d (#772)
3 years ago
アマデウス b8899e0905
[TP] allow layernorm without bias (#750)
3 years ago
Frank Lee eda30a058e
[compatibility] fixed tensor parallel compatibility with torch 1.9 (#700)
3 years ago
HELSON a9b8300d54
[zero] improve adaptability for not-shard parameters (#708)
3 years ago
アマデウス 3fc8a204dc
[]Corrected 3d vocab parallel embedding (#707)
3 years ago
HELSON b31daed4cf
fix bugs in CPU adam (#633)
3 years ago
Liang Bowen 828e465622
[hotfix] Raise messages for indivisible batch sizes with tensor parallelism (#622)
3 years ago
アマデウス 77ad24bf94
[model checkpoint] updated saving/loading for 3d layers (#597)
3 years ago
アマデウス 93089ed708
[model checkpoint] updated saving/loading for 2.5d layers (#596)
3 years ago
アマデウス c50bfb807b
[model checkpoint] updated saving/loading for 1d layers (#594)
3 years ago
アマデウス 7636d518e1
[model checkpoint] updated saving/loading for 2d layers (#595)
3 years ago
アマデウス cd13b63832
[model checkpoint] reworked unified layers for ease of save/load states (#593)
3 years ago
Ziyue Jiang 1c40ee8749
[TP] add assert for tp1d (#621)
3 years ago
ver217 e619a651fb
polish optimizer docstring (#619)
3 years ago
ver217 8432dc7080
polish moe docsrting (#618)
3 years ago
ver217 104cbbb313
[hotfix] add hybrid adam to __init__ (#584)
3 years ago
HELSON e6d50ec107
[zero] adapt zero for unsharded parameters (#561)
3 years ago
Wesley 46c9ba33da update code format
3 years ago
Wesley 666cfd094a fix parallel_input flag for Linear1D_Col gather_output
3 years ago
Liang Bowen 2c45efc398
html refactor (#555)
3 years ago
LuGY c44d797072
[docs] updatad docs of hybrid adam and cpu adam (#552)
3 years ago
Ziyue Jiang 763dc325f1
[TP] Add gather_out arg to Linear (#541)
3 years ago
HELSON 8c90d4df54
[zero] add zero context manager to change config during initialization (#546)
3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style
3 years ago
LuGY 105c5301c3
[zero]added hybrid adam, removed loss scale in adam (#527)
3 years ago
LuGY 6a3f9fda83
[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497)
3 years ago
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500)
3 years ago
ver217 9ec1ce6ab1
[zero] sharded model support the reuse of fp16 shard (#495)
3 years ago
HELSON c9023d4078
[MOE] support PR-MOE (#488)
3 years ago
ver217 62b0a8d644
[zero] sharded optim support hybrid cpu adam (#486)
3 years ago
HELSON d7ea63992b
[MOE] add FP32LinearGate for MOE in NaiveAMP context (#480)
3 years ago
Jiarui Fang 65c0f380c2
[format] polish name format for MOE (#481)
3 years ago
HELSON 7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469)
3 years ago
HELSON aff9d354f7
[MOE] polish moe_env (#467)
3 years ago
HELSON bccbc15861
[MOE] changed parallelmode to dist process group (#460)
3 years ago
Jiarui Fang 0fcfb1e00d
[test] make zero engine test really work (#447)
3 years ago
Jiarui Fang 237d08e7ee
[zero] hybrid cpu adam (#445)
3 years ago
HELSON dbdc9a7783
added Multiply Jitter and capacity factor eval for MOE (#434)
3 years ago
HELSON 3f70a2b12f
removed noisy function during evaluation of MoE router (#419)
3 years ago
Jiang Zhuo 5a4a3b77d9 fix format (#376)
3 years ago
LuGY de46450461 Added activation offload (#331)
3 years ago