Commit Graph

2214 Commits (94c24d94447405bdd5e1f3ded997137f38147329)

Author SHA1 Message Date
Super Daniel c198c7c0b0
[hotfix] meta tensor default device. (#2510) 2023-01-29 16:28:10 +08:00
HELSON 077a5cdde4
[zero] fix gradient clipping in hybrid parallelism (#2521)
* [zero] fix gradient clipping in hybrid parallelism

* [testing] change model name to avoid pytest warning

* [hotfix] fix unit testing
2023-01-29 15:09:57 +08:00
Jiarui Fang fd8d19a6e7
[example] update lightning dependency for stable diffusion (#2522) 2023-01-29 13:52:15 +08:00
YuliangLiu0306 aa0f6686f9
[autoparallel] accelerate gpt2 training (#2495) 2023-01-29 11:13:15 +08:00
binmakeswell a360b9bc44
[doc] update example link (#2520)
* [doc] update example link

* [doc] update example link
2023-01-29 10:53:57 +08:00
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
Frank Lee 0af793836c
[workflow] fixed changed file detection (#2515) 2023-01-26 16:34:19 +08:00
binmakeswell a6a10616ec
[doc] update opt and tutorial links (#2509) 2023-01-20 17:29:13 +08:00
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
oahzxl c04f183237
[autochunk] support parsing blocks (#2506) 2023-01-20 11:18:17 +08:00
Super Daniel 35c0c0006e
[utils] lazy init. (#2148)
* [utils] lazy init.

* [utils] remove description.

* [utils] complete.

* [utils] finalize.

* [utils] fix names.
2023-01-20 10:49:00 +08:00
oahzxl 72341e65f4
[auto-chunk] support extramsa (#3) (#2504) 2023-01-20 10:13:03 +08:00
Ziyue Jiang 0f02b8c6e6
add avg partition (#2483)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-19 13:54:50 +08:00
アマデウス 99d9713b02 Revert "Update parallel_context.py (#2408)"
This reverts commit 7d5640b9db.
2023-01-19 12:27:48 +08:00
oahzxl ecccc91f21
[autochunk] support autochunk on evoformer (#2497) 2023-01-19 11:41:00 +08:00
Fazzie-Maqianli 304f1ba124
Merge pull request #2499 from feifeibear/dev0116_10
[example] check dreambooth example gradient accmulation must be 1
2023-01-19 09:58:21 +08:00
jiaruifang 32390cbe8f add test_ci.sh to dreambooth 2023-01-19 09:46:28 +08:00
jiaruifang 7f822a5c45 Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into dev0116 2023-01-18 18:43:11 +08:00
jiaruifang 025b482dc1 [example] dreambooth example 2023-01-18 18:42:56 +08:00
oahzxl 5db3a5bf42
[fx] allow control of ckpt_codegen init (#2498)
* [fx] allow control of ckpt_codegen init

Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so. 
So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__.

* code style
2023-01-18 17:02:46 +08:00
Jiarui Fang e327e95144
[hotfix] gpt example titans bug #2493 (#2494) 2023-01-18 12:04:18 +08:00
jiaruifang e58cc441e2 polish code and fix dataloader bugs 2023-01-18 12:00:08 +08:00
jiaruifang a4b75b78a0 [hotfix] gpt example titans bug #2493 2023-01-18 11:37:16 +08:00
jiaruifang 8208fd023a Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into dev0116 2023-01-18 11:32:29 +08:00
HELSON d565a24849
[zero] add unit testings for hybrid parallelism (#2486) 2023-01-18 10:36:10 +08:00
binmakeswell fcc6d61d92
[example] fix requirements (#2488) 2023-01-17 13:07:25 +08:00
oahzxl 4953b4ace1
[autochunk] support evoformer tracer (#2485)
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2023-01-16 19:25:05 +08:00
YuliangLiu0306 67e1912b59
[autoparallel] support origin activation ckpt on autoprallel system (#2468) 2023-01-16 16:25:13 +08:00
Jiarui Fang 3a21485ead
[example] titans for gpt (#2484) 2023-01-16 15:55:41 +08:00
jiaruifang 438ea608f3 update readme 2023-01-16 15:54:36 +08:00
jiaruifang 38424db6ff polish code 2023-01-16 15:21:22 +08:00
jiaruifang 92f65fbbe3 remove license 2023-01-16 15:18:49 +08:00
jiaruifang 315e1433ce polish readme 2023-01-16 15:17:27 +08:00
jiaruifang 37baea20cb [example] titans for gpt 2023-01-16 14:59:25 +08:00
jiaruifang 236b4195ff Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into dev0116 2023-01-16 14:45:14 +08:00
jiaruifang e64a05b38b polish code 2023-01-16 14:45:06 +08:00
Jiarui Fang 7c31706227
[CI] add test_ci.sh for palm, opt and gpt (#2475) 2023-01-16 14:44:29 +08:00
Jiarui Fang e4c38ba367
[example] stable diffusion add roadmap (#2482) 2023-01-16 12:14:49 +08:00
jiaruifang 9cba38b492 add dummy test_ci.sh 2023-01-16 12:03:48 +08:00
jiaruifang f78bad21ed [example] stable diffusion add roadmap 2023-01-16 11:34:26 +08:00
Frank Lee 579dba572f
[workflow] fixed the skip condition of example weekly check workflow (#2481) 2023-01-16 10:05:41 +08:00
HELSON 21c88220ce
[zero] add unit test for low-level zero init (#2474) 2023-01-15 10:42:01 +08:00
ver217 f525d1f528
[example] update gpt gemini example ci test (#2477) 2023-01-13 22:37:31 +08:00
Ziyue Jiang fef5c949c3
polish pp middleware (#2476)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-13 16:56:01 +08:00
HELSON a5dc4253c6
[zero] polish low level optimizer (#2473) 2023-01-13 14:56:17 +08:00
Frank Lee 8b7495dd54
[example] integrate seq-parallel tutorial with CI (#2463) 2023-01-13 14:40:05 +08:00
ver217 8e85d2440a
[example] update vit ci script (#2469)
* [example] update vit ci script

* [example] update requirements

* [example] update requirements
2023-01-13 13:31:27 +08:00
Jiarui Fang 867c8c2d3a
[zero] low level optim supports ProcessGroup (#2464) 2023-01-13 10:05:58 +08:00
Frank Lee e6943e2d11
[example] integrate autoparallel demo with CI (#2466)
* [example] integrate autoparallel demo with CI

* polish code

* polish code

* polish code

* polish code
2023-01-12 16:26:42 +08:00
Frank Lee 14d9299360
[cli] fixed hostname mismatch error (#2465) 2023-01-12 14:52:09 +08:00