YuliangLiu0306
1dc003c169
[autoparallel] distinguish different parallel strategies ( #2699 )
2 years ago
YuliangLiu0306
21d6a48f4d
[autoparallel] add shard option ( #2696 )
...
* [autoparallel] add shard option
* polish
2 years ago
YuliangLiu0306
cb2c6a2415
[autoparallel] refactor runtime pass ( #2644 )
...
* [autoparallel] refactor runtime pass
* add unit test
* polish
2 years ago
YuliangLiu0306
0b2a738393
[autoparallel] remove deprecated codes ( #2664 )
2 years ago
YuliangLiu0306
7fa6be49d2
[autoparallel] test compatibility for gemini and auto parallel ( #2700 )
2 years ago
Boyuan Yao
40c916b192
[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` ( #2674 )
...
* [autoparallel] softmax metainfo
* [autoparallel] softmax metainfo
2 years ago
HELSON
8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode ( #2671 )
2 years ago
Boyuan Yao
0385b26ebf
[autoparallel] Patch meta information of `torch.nn.LayerNorm` ( #2647 )
...
* [autoparallel] layernorm metainfo patch
* [autoparallel] polish test
2 years ago
YuliangLiu0306
37df666f38
[autoparallel] refactor handlers which reshape input tensors ( #2615 )
...
* [autoparallel] refactor handlers which reshape input tensors
* polish
2 years ago
YuliangLiu0306
cb3d1bef62
[autoparallel] adapt autoparallel tests with latest api ( #2626 )
2 years ago
Boyuan Yao
90a9fdd91d
[autoparallel] Patch meta information of `torch.matmul` ( #2584 )
...
* [autoparallel] matmul metainfo
* [auto_parallel] remove unused print
* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2 years ago
oahzxl
6ba8364881
[autochunk] support diffusion for autochunk ( #2621 )
...
* add alphafold benchmark
* renae alphafold test
* rename tests
* rename diffuser
* renme
* rename
* update transformer
* update benchmark
* update benchmark
* update bench memory
* update transformer benchmark
* rename
* support diffuser
* support unet metainfo prop
* fix bug and simplify code
* update linear and support some op
* optimize max region search, support conv
* update unet test
* support some op
* support groupnorm and interpolate
* update flow search
* add fix dim in node flow
* fix utils
* rename
* support diffusion
* update diffuser
* update chunk search
* optimize imports
* import
* finish autochunk
2 years ago
oahzxl
c4b15661d7
[autochunk] add benchmark for transformer and alphafold ( #2543 )
2 years ago
oahzxl
05671fcb42
[autochunk] support multi outputs chunk search ( #2538 )
...
Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy.
1. rewrite search strategy to support multi outputs chunk search
2. fix many, many bugs
3. update tests
2 years ago
oahzxl
63199c6687
[autochunk] support transformer ( #2526 )
2 years ago
Frank Lee
b55deb0662
[workflow] only report coverage for changed files ( #2524 )
...
* [workflow] only report coverage for changed files
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
2 years ago
HELSON
b528eea0f0
[zero] add zero wrappers ( #2523 )
...
* [zero] add zero wrappers
* change names
* add wrapper functions to init
2 years ago
HELSON
077a5cdde4
[zero] fix gradient clipping in hybrid parallelism ( #2521 )
...
* [zero] fix gradient clipping in hybrid parallelism
* [testing] change model name to avoid pytest warning
* [hotfix] fix unit testing
2 years ago
HELSON
707b11d4a0
[gemini] update ddp strict mode ( #2518 )
...
* [zero] add strict ddp mode for chunk init
* [gemini] update gpt example
2 years ago
HELSON
2d1a7dfe5f
[zero] add strict ddp mode ( #2508 )
...
* [zero] add strict ddp mode
* [polish] add comments for strict ddp mode
* [zero] fix test error
2 years ago
oahzxl
c04f183237
[autochunk] support parsing blocks ( #2506 )
2 years ago
oahzxl
72341e65f4
[auto-chunk] support extramsa ( #3 ) ( #2504 )
2 years ago
oahzxl
ecccc91f21
[autochunk] support autochunk on evoformer ( #2497 )
2 years ago
HELSON
d565a24849
[zero] add unit testings for hybrid parallelism ( #2486 )
2 years ago
oahzxl
4953b4ace1
[autochunk] support evoformer tracer ( #2485 )
...
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2 years ago
YuliangLiu0306
67e1912b59
[autoparallel] support origin activation ckpt on autoprallel system ( #2468 )
2 years ago
HELSON
21c88220ce
[zero] add unit test for low-level zero init ( #2474 )
2 years ago
HELSON
a5dc4253c6
[zero] polish low level optimizer ( #2473 )
2 years ago
Jiarui Fang
867c8c2d3a
[zero] low level optim supports ProcessGroup ( #2464 )
2 years ago
YuliangLiu0306
8221fd7485
[autoparallel] update binary elementwise handler ( #2451 )
...
* [autoparallel] update binary elementwise handler
* polish
2 years ago
HELSON
5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters ( #2443 )
...
* [ddp] add is_ddp_ignored
[ddp] rename to is_ddp_ignored
* [zero] fix state_dict and load_state_dict
* fix bugs
* [zero] update unit test for ZeroDDP
2 years ago
YuliangLiu0306
41429b9b28
[autoparallel] add shard option ( #2423 )
2 years ago
HELSON
bb4e9a311a
[zero] add inference mode and its unit test ( #2418 )
2 years ago
oahzxl
61fdd3464a
update doc
2 years ago
oahzxl
36ab2cb783
change import
2 years ago
oahzxl
7ab2db206f
adapt new fx
2 years ago
oahzxl
e532679c95
Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk
2 years ago
oahzxl
c1492e5013
add test in import
2 years ago
HELSON
ea13a201bb
[polish] polish code for get_static_torch_model ( #2405 )
...
* [gemini] polish code
* [testing] remove code
* [gemini] make more robust
2 years ago
oahzxl
212b5b1b5f
add comments
2 years ago
oahzxl
aafc3516a5
add available
2 years ago
oahzxl
d5c4f0bf95
code style
2 years ago
oahzxl
d106b271f8
add chunk search test
2 years ago
oahzxl
a005965d2d
update codegen test
2 years ago
oahzxl
3abbaf8bc6
update codegen test
2 years ago
oahzxl
74b81395a2
update codegen test
2 years ago
oahzxl
18a51c87fe
rename test
2 years ago
oahzxl
cb68ee864a
set benchmark
2 years ago
Jiarui Fang
4e96039649
[device] find best logical mesh
2 years ago
Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels ( #2374 )
...
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago