Ziyue Jiang
0f02b8c6e6
add avg partition ( #2483 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
アマデウス
99d9713b02
Revert "Update parallel_context.py ( #2408 )"
...
This reverts commit 7d5640b9db
.
2 years ago
oahzxl
ecccc91f21
[autochunk] support autochunk on evoformer ( #2497 )
2 years ago
oahzxl
5db3a5bf42
[fx] allow control of ckpt_codegen init ( #2498 )
...
* [fx] allow control of ckpt_codegen init
Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so.
So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__.
* code style
2 years ago
HELSON
d565a24849
[zero] add unit testings for hybrid parallelism ( #2486 )
2 years ago
oahzxl
4953b4ace1
[autochunk] support evoformer tracer ( #2485 )
...
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2 years ago
YuliangLiu0306
67e1912b59
[autoparallel] support origin activation ckpt on autoprallel system ( #2468 )
2 years ago
Ziyue Jiang
fef5c949c3
polish pp middleware ( #2476 )
...
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2 years ago
HELSON
a5dc4253c6
[zero] polish low level optimizer ( #2473 )
2 years ago
Frank Lee
8b7495dd54
[example] integrate seq-parallel tutorial with CI ( #2463 )
2 years ago
Jiarui Fang
867c8c2d3a
[zero] low level optim supports ProcessGroup ( #2464 )
2 years ago
Frank Lee
14d9299360
[cli] fixed hostname mismatch error ( #2465 )
2 years ago
Haofan Wang
9358262992
Fix False warning in initialize.py ( #2456 )
...
* Update initialize.py
* pre-commit run check
2 years ago
YuliangLiu0306
8221fd7485
[autoparallel] update binary elementwise handler ( #2451 )
...
* [autoparallel] update binary elementwise handler
* polish
2 years ago
HELSON
2bfeb24308
[zero] add warning for ignored parameters ( #2446 )
2 years ago
Frank Lee
39163417a1
[example] updated the hybrid parallel tutorial ( #2444 )
...
* [example] updated the hybrid parallel tutorial
* polish code
2 years ago
HELSON
5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters ( #2443 )
...
* [ddp] add is_ddp_ignored
[ddp] rename to is_ddp_ignored
* [zero] fix state_dict and load_state_dict
* fix bugs
* [zero] update unit test for ZeroDDP
2 years ago
YuliangLiu0306
2731531bc2
[autoparallel] integrate device mesh initialization into autoparallelize ( #2393 )
...
* [autoparallel] integrate device mesh initialization into autoparallelize
* add megatron solution
* update gpt autoparallel examples with latest api
* adapt beta value to fit the current computation cost
2 years ago
Frank Lee
c72c827e95
[cli] provided more details if colossalai run fail ( #2442 )
2 years ago
Super Daniel
c41e59e5ad
[fx] allow native ckpt trace and codegen. ( #2438 )
2 years ago
YuliangLiu0306
41429b9b28
[autoparallel] add shard option ( #2423 )
2 years ago
HELSON
7829aa094e
[ddp] add is_ddp_ignored ( #2434 )
...
[ddp] rename to is_ddp_ignored
2 years ago
HELSON
bb4e9a311a
[zero] add inference mode and its unit test ( #2418 )
2 years ago
HELSON
dddacd2d2c
[hotfix] add norm clearing for the overflow step ( #2416 )
2 years ago
oahzxl
7ab2db206f
adapt new fx
2 years ago
Haofan Wang
7d5640b9db
Update parallel_context.py ( #2408 )
2 years ago
oahzxl
fd818cf144
change imports
2 years ago
oahzxl
a591d45b29
add available
2 years ago
oahzxl
615e7e68d9
update doc
2 years ago
oahzxl
7d4abaa525
add doc
2 years ago
oahzxl
1be0ac3cbf
add doc for trace indice
2 years ago
oahzxl
0b6af554df
remove useless function
2 years ago
oahzxl
d914a21d64
rename
2 years ago
oahzxl
865f2e0196
rename
2 years ago
HELSON
ea13a201bb
[polish] polish code for get_static_torch_model ( #2405 )
...
* [gemini] polish code
* [testing] remove code
* [gemini] make more robust
2 years ago
oahzxl
a4ed5b0d0d
rename in doc
2 years ago
oahzxl
1bb1f2ad89
rename
2 years ago
oahzxl
cb9817f75d
rename function from index to indice
2 years ago
oahzxl
0ea903b94e
rename trace_index to trace_indice
2 years ago
Frank Lee
551cafec14
[doc] updated kernel-related optimisers' docstring ( #2385 )
...
* [doc] updated kernel-related optimisers' docstring
* polish doc
2 years ago
oahzxl
065f0b4c27
add doc for search
2 years ago
oahzxl
a68d240ed5
add doc for search chunk
2 years ago
oahzxl
1951f7fa87
code style
2 years ago
oahzxl
212b5b1b5f
add comments
2 years ago
oahzxl
19cc64b1d3
remove autochunk_available
2 years ago
eric8607242
9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP ( #2363 )
...
* Fix state_dict output for ZeroDDP duplicated parameters
* Rewrite state_dict based on get_static_torch_model
* Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)
2 years ago
oahzxl
4d223e18a2
fix typo
2 years ago
Frank Lee
ce08661eb1
[cli] updated installation check cli for aot/jit build ( #2395 )
2 years ago
jiaruifang
69d9180c4b
[hotfix] issue #2388
2 years ago
Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels ( #2374 )
...
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago