Commit Graph

2394 Commits (41fb7236aa32c307e83b0b9cc50ce2a6da279343)

Author SHA1 Message Date
Frank Lee ba47517342
[workflow] fixed example check workflow (#2554)
* [workflow] fixed example check workflow

* polish yaml
2023-02-06 13:46:52 +08:00
Frank Lee fb1a4c0d96
[doc] fixed issue link in pr template (#2577) 2023-02-06 10:29:24 +08:00
binmakeswell 039b0c487b
[tutorial] polish README (#2568) 2023-02-04 17:49:52 +08:00
Frank Lee 2eb4268b47
[workflow] fixed typos in the leaderboard workflow (#2567) 2023-02-03 17:25:56 +08:00
Frank Lee 7b4ad6e0fc
[workflow] added contributor and user-engagement report (#2564)
* [workflow] added contributor and user-engagement report

* polish code

* polish code
2023-02-03 17:12:35 +08:00
oahzxl 4f5ef73a43
[tutorial] update fastfold tutorial (#2565)
* update readme

* update

* update
2023-02-03 16:54:28 +08:00
Fazzie-Maqianli 79079a9d0c
Merge pull request #2561 from Fazziekey/v2
bug/fix diffusion ckpt problem
2023-02-03 15:42:49 +08:00
Fazzie cad1f50512 fix ckpt 2023-02-03 15:39:59 +08:00
HELSON 552183bb74
[polish] polish ColoTensor and its submodules (#2537) 2023-02-03 11:44:10 +08:00
github-actions[bot] 51d4d6e718
Automated submodule synchronization (#2492)
Co-authored-by: github-actions <github-actions@github.com>
2023-02-03 10:48:15 +08:00
Frank Lee 4af31d263d
[doc] updated the CHANGE_LOG.md for github release page (#2552) 2023-02-03 10:47:27 +08:00
Frank Lee 578374d0de
[doc] fixed the typo in pr template (#2556) 2023-02-03 10:47:00 +08:00
Frank Lee dd14783f75
[kernel] fixed repeated loading of kernels (#2549)
* [kernel] fixed repeated loading of kernels

* polish code

* polish code
2023-02-03 09:47:13 +08:00
Frank Lee 8438c35a5f
[doc] added pull request template (#2550)
* [doc] added pull  request template

* polish code

* polish code
2023-02-02 18:16:03 +08:00
ver217 5b1854309a
[hotfix] fix zero ddp warmup check (#2545) 2023-02-02 16:42:38 +08:00
oahzxl fa3d66feb9
support unet metainfo prop (#2544) 2023-02-02 16:19:26 +08:00
oahzxl c4b15661d7
[autochunk] add benchmark for transformer and alphafold (#2543) 2023-02-02 15:06:43 +08:00
binmakeswell 9885ec2b2e
[git] remove invalid submodule (#2540) 2023-02-01 17:54:03 +08:00
oahzxl 05671fcb42
[autochunk] support multi outputs chunk search (#2538)
Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy.

1. rewrite search strategy to support multi outputs chunk search
2. fix many, many bugs
3. update tests
2023-02-01 13:18:51 +08:00
YuliangLiu0306 f477a14f4a
[hotfix] fix autoparallel demo (#2533) 2023-01-31 17:42:45 +08:00
oahzxl 63199c6687
[autochunk] support transformer (#2526) 2023-01-31 16:00:06 +08:00
HELSON 6e0faa70e0
[gemini] add profiler in the demo (#2534) 2023-01-31 14:21:22 +08:00
Fazzie-Maqianli df437ca039
Merge pull request #2532 from Fazziekey/fix
fix README
2023-01-31 10:56:35 +08:00
Fazzie f35326881c fix README 2023-01-31 10:51:13 +08:00
HELSON a4ed9125ac
[hotfix] fix lightning error (#2529) 2023-01-31 10:40:39 +08:00
Frank Lee b55deb0662
[workflow] only report coverage for changed files (#2524)
* [workflow] only report coverage for changed files

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file
2023-01-30 21:28:27 +08:00
HELSON 66dfcf5281
[gemini] update the gpt example (#2527) 2023-01-30 17:58:05 +08:00
LuGY ecbad93b65
[example] Add fastfold tutorial (#2528)
* add fastfold example

* pre-commit polish

* pre-commit polish readme and add empty test ci

* Add test_ci and reduce the default sequence length
2023-01-30 17:08:18 +08:00
Frank Lee af151032f2
[workflow] fixed the precommit CI (#2525)
* [workflow] fixed the precommit CI

* polish file

* polish file
2023-01-30 10:02:13 +08:00
HELSON b528eea0f0
[zero] add zero wrappers (#2523)
* [zero] add zero wrappers

* change names

* add wrapper functions to init
2023-01-29 17:52:58 +08:00
Super Daniel c198c7c0b0
[hotfix] meta tensor default device. (#2510) 2023-01-29 16:28:10 +08:00
HELSON 077a5cdde4
[zero] fix gradient clipping in hybrid parallelism (#2521)
* [zero] fix gradient clipping in hybrid parallelism

* [testing] change model name to avoid pytest warning

* [hotfix] fix unit testing
2023-01-29 15:09:57 +08:00
Jiarui Fang fd8d19a6e7
[example] update lightning dependency for stable diffusion (#2522) 2023-01-29 13:52:15 +08:00
YuliangLiu0306 aa0f6686f9
[autoparallel] accelerate gpt2 training (#2495) 2023-01-29 11:13:15 +08:00
binmakeswell a360b9bc44
[doc] update example link (#2520)
* [doc] update example link

* [doc] update example link
2023-01-29 10:53:57 +08:00
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
Frank Lee 0af793836c
[workflow] fixed changed file detection (#2515) 2023-01-26 16:34:19 +08:00
binmakeswell a6a10616ec
[doc] update opt and tutorial links (#2509) 2023-01-20 17:29:13 +08:00
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
oahzxl c04f183237
[autochunk] support parsing blocks (#2506) 2023-01-20 11:18:17 +08:00
Super Daniel 35c0c0006e
[utils] lazy init. (#2148)
* [utils] lazy init.

* [utils] remove description.

* [utils] complete.

* [utils] finalize.

* [utils] fix names.
2023-01-20 10:49:00 +08:00
oahzxl 72341e65f4
[auto-chunk] support extramsa (#3) (#2504) 2023-01-20 10:13:03 +08:00
Ziyue Jiang 0f02b8c6e6
add avg partition (#2483)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-19 13:54:50 +08:00
アマデウス 99d9713b02 Revert "Update parallel_context.py (#2408)"
This reverts commit 7d5640b9db.
2023-01-19 12:27:48 +08:00
oahzxl ecccc91f21
[autochunk] support autochunk on evoformer (#2497) 2023-01-19 11:41:00 +08:00
Fazzie-Maqianli 304f1ba124
Merge pull request #2499 from feifeibear/dev0116_10
[example] check dreambooth example gradient accmulation must be 1
2023-01-19 09:58:21 +08:00
jiaruifang 32390cbe8f add test_ci.sh to dreambooth 2023-01-19 09:46:28 +08:00
jiaruifang 7f822a5c45 Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into dev0116 2023-01-18 18:43:11 +08:00
jiaruifang 025b482dc1 [example] dreambooth example 2023-01-18 18:42:56 +08:00
oahzxl 5db3a5bf42
[fx] allow control of ckpt_codegen init (#2498)
* [fx] allow control of ckpt_codegen init

Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so. 
So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__.

* code style
2023-01-18 17:02:46 +08:00