Commit Graph

471 Commits (pre-commit-ci-update-config)

Author SHA1 Message Date
Xu Kai 946ab56c48
[feature] add gptq for inference (#4754)
* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import
2023-09-22 11:02:50 +08:00
Baizhou Zhang df66741f77
[bug] fix get_default_parser in examples (#4764) 2023-09-21 10:42:25 +08:00
Wenhao Chen 7b9b86441f
[chat]: update rm, add wandb and fix bugs (#4471)
* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>
2023-09-20 15:53:58 +08:00
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
github-actions[bot] 3c6b831c26
[format] applied code formatting on changed files in pull request 4743 (#4750)
Co-authored-by: github-actions <github-actions@github.com>
2023-09-18 16:52:42 +08:00
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci
2023-09-18 16:31:06 +08:00
flybird11111 4c4482f3ad
[example] llama2 add fine-tune example (#4673)
* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example
2023-09-15 18:45:44 +08:00
Bin Jia 608cffaed3
[example] add gpt2 HybridParallelPlugin example (#4653)
* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file
2023-09-15 17:12:46 +08:00
binmakeswell ce97790ed7
[doc] fix llama2 code link (#4726)
* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link
2023-09-14 23:19:25 +08:00
Baizhou Zhang 068372a738
[doc] add potential solution for OOM in llama2 example (#4699) 2023-09-13 10:43:30 +08:00
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577)
* [infer] Infer/llama demo (#4503)

* add

* add infer example

* finish

* finish

* stash

* fix

* [Kernels]  add inference token attention kernel (#4505)

* add token forward

* fix tests

* fix comments

* add try import triton

* add adapted license

* add tests check

* [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager  (#4485)

* added _vllm_rms_norm

* change place

* added tests

* added tests

* modify

* adding kernels

* added tests:

* adding kernels

* modify

* added

* updating kernels

* adding tests

* added tests

* kernel change

* submit

* modify

* added

* edit comments

* change name

* change commnets and fix import

* add

* added

* combine codes (#4509)

* [feature] add KV cache manager for llama & bloom inference (#4495)

* add kv cache memory manager

* add stateinfo during inference

* format

* format

* rename file

* add kv cache test

* revise on BatchInferState

* file dir change

* [Bug FIx] import llama context ops fix (#4524)

* added _vllm_rms_norm

* change place

* added tests

* added tests

* modify

* adding kernels

* added tests:

* adding kernels

* modify

* added

* updating kernels

* adding tests

* added tests

* kernel change

* submit

* modify

* added

* edit comments

* change name

* change commnets and fix import

* add

* added

* fix

* add ops into init.py

* add

* [Infer] Add TPInferEngine and fix file path (#4532)

* add engine for TP inference

* move file path

* update path

* fix TPInferEngine

* remove unused file

* add engine test demo

* revise TPInferEngine

* fix TPInferEngine, add test

* fix

* Add Inference test for llama (#4508)

* add kv cache memory manager

* add stateinfo during inference

* add

* add infer example

* finish

* finish

* format

* format

* rename file

* add kv cache test

* revise on BatchInferState

* add inference test for llama

* fix conflict

* feature: add some new features for llama engine

* adapt colossalai triton interface

* Change the parent class of llama  policy

* add nvtx

* move llama inference code to tensor_parallel

* fix __init__.py

* rm tensor_parallel

* fix: fix bugs in auto_policy.py

* fix:rm some unused codes

* mv colossalai/tpinference to colossalai/inference/tensor_parallel

* change __init__.py

* save change

* fix engine

* Bug fix: Fix hang

* remove llama_infer_engine.py

---------

Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>

* [infer] Add Bloom inference policy and replaced methods (#4512)

* add bloom inference methods and policy

* enable pass BatchInferState from model forward

* revise bloom infer layers/policies

* add engine for inference (draft)

* add test for bloom infer

* fix bloom infer policy and flow

* revise bloom test

* fix bloom file path

* remove unused codes

* fix bloom modeling

* fix dir typo

* fix trivial

* fix policy

* clean pr

* trivial fix

* Revert "[infer] Add Bloom inference policy and replaced methods (#4512)" (#4552)

This reverts commit 17cfa57140.

* [Doc] Add colossal inference doc (#4549)

* create readme

* add readme.md

* fix typos

* [infer] Add Bloom inference policy and replaced methods (#4553)

* add bloom inference methods and policy

* enable pass BatchInferState from model forward

* revise bloom infer layers/policies

* add engine for inference (draft)

* add test for bloom infer

* fix bloom infer policy and flow

* revise bloom test

* fix bloom file path

* remove unused codes

* fix bloom modeling

* fix dir typo

* fix trivial

* fix policy

* clean pr

* trivial fix

* trivial

* Fix Bugs In Llama Model Forward (#4550)

* add kv cache memory manager

* add stateinfo during inference

* add

* add infer example

* finish

* finish

* format

* format

* rename file

* add kv cache test

* revise on BatchInferState

* add inference test for llama

* fix conflict

* feature: add some new features for llama engine

* adapt colossalai triton interface

* Change the parent class of llama  policy

* add nvtx

* move llama inference code to tensor_parallel

* fix __init__.py

* rm tensor_parallel

* fix: fix bugs in auto_policy.py

* fix:rm some unused codes

* mv colossalai/tpinference to colossalai/inference/tensor_parallel

* change __init__.py

* save change

* fix engine

* Bug fix: Fix hang

* remove llama_infer_engine.py

* bug fix: fix bugs about infer_state.is_context_stage

* remove pollcies

* fix: delete unused code

* fix: delete unused code

* remove unused coda

* fix conflict

---------

Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>

* [doc] add colossal inference fig (#4554)

* create readme

* add readme.md

* fix typos

* upload fig

* [NFC] fix docstring for colossal inference (#4555)

Fix docstring and comments in kv cache manager and bloom modeling

* fix docstring in llama modeling (#4557)

* [Infer] check import vllm (#4559)

* change import vllm

* import apply_rotary_pos_emb

* change import location

* [DOC] add installation req (#4561)

* add installation req

* fix

* slight change

* remove empty

* [Feature] rms-norm transfer into inference llama.py  (#4563)

* add installation req

* fix

* slight change

* remove empty

* add rmsnorm polciy

* add

* clean codes

* [infer] Fix tp inference engine (#4564)

* fix engine prepare data

* add engine test

* use bloom for testing

* revise on test

* revise on test

* reset shardformer llama (#4569)

* [infer] Fix engine - tensors on different devices (#4570)


* fix diff device in engine

* [codefactor] Feature/colossal inference (#4579)

* code factors

* remove

* change coding (#4581)

* [doc] complete README of colossal inference (#4585)

* complete fig

* Update README.md

* [doc]update readme (#4586)

* update readme

* Update README.md

* bug fix: fix bus in llama and bloom (#4588)

* [BUG FIX]Fix test engine in CI and non-vllm kernels llama forward  (#4592)

* fix tests

* clean

* clean

* fix bugs

* add

* fix llama non-vllm kernels bug

* modify

* clean codes

* [Kernel]Rmsnorm fix (#4598)

* fix tests

* clean

* clean

* fix bugs

* add

* fix llama non-vllm kernels bug

* modify

* clean codes

* add triton rmsnorm

* delete vllm kernel flag

* [Bug Fix]Fix bugs in llama (#4601)

* fix tests

* clean

* clean

* fix bugs

* add

* fix llama non-vllm kernels bug

* modify

* clean codes

* bug fix: remove rotary_positions_ids

---------

Co-authored-by: cuiqing.li <lixx3527@gmail.com>

* [kernel] Add triton layer norm & replace norm for bloom (#4609)

* add layernorm for inference

* add test for layernorm kernel

* add bloom layernorm replacement policy

* trivial: path

* [Infer] Bug fix rotary embedding in llama (#4608)

* fix rotary embedding

* delete print

* fix init seq len bug

* rename pytest

* add benchmark for llama

* refactor codes

* delete useless code

* [bench] Add bloom inference benchmark (#4621)

* add bloom benchmark

* readme - update benchmark res

* trivial - uncomment for testing (#4622)

* [Infer] add check triton and cuda version for tests (#4627)

* fix rotary embedding

* delete print

* fix init seq len bug

* rename pytest

* add benchmark for llama

* refactor codes

* delete useless code

* add check triton and cuda

* Update sharder.py (#4629)

* [Inference] Hot fix some bugs and typos (#4632)

* fix

* fix test

* fix conflicts

* [typo]Comments fix (#4633)

* fallback

* fix commnets

* bug fix: fix some bugs in test_llama and test_bloom (#4635)

* [Infer] delete benchmark in tests and fix bug for llama and bloom (#4636)

* fix rotary embedding

* delete print

* fix init seq len bug

* rename pytest

* add benchmark for llama

* refactor codes

* delete useless code

* add check triton and cuda

* delete benchmark and fix infer bugs

* delete benchmark for tests

* delete useless code

* delete bechmark function in utils

* [Fix] Revise TPInferEngine, inference tests and benchmarks (#4642)

* [Fix] revise TPInferEngine methods and inference tests

* fix llama/bloom infer benchmarks

* fix infer tests

* trivial fix: benchmakrs

* trivial

* trivial: rm print

* modify utils filename for infer ops test (#4657)

* [Infer] Fix TPInferEngine init & inference tests, benchmarks (#4670)

* fix engine funcs

* TPInferEngine: receive shard config in init

* benchmarks: revise TPInferEngine init

* benchmarks: remove pytest decorator

* trivial fix

* use small model for tests

* [NFC] use args for infer benchmarks (#4674)

* revise infer default (#4683)

* [Fix] optimize/shard model in TPInferEngine init (#4684)

* remove using orig model in engine

* revise inference tests

* trivial: rename

---------

Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Xu Kai <xukai16@foxmail.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
2023-09-12 01:22:56 +08:00
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671)
* [legacy] move communication to legacy (#4640)

* [legacy] refactor logger and clean up legacy codes (#4654)

* [legacy] make logger independent to gpc

* [legacy] make optim independent to registry

* [legacy] move test engine to legacy

* [legacy] move nn to legacy (#4656)

* [legacy] move nn to legacy

* [checkpointio] fix save hf config

* [test] remove useledd rpc pp test

* [legacy] fix nn init

* [example] skip tutorial hybriad parallel example

* [devops] test doc check

* [devops] test doc check
2023-09-11 16:24:28 +08:00
flybird11111 7486ed7d3a
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645)
* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix
2023-09-09 22:45:36 +08:00
Baizhou Zhang 295b38fecf
[example] update vit example for hybrid parallel plugin (#4641)
* update vit example for hybrid plugin

* reset tp/pp size

* fix dataloader iteration bug

* update optimizer passing in evaluation/add grad_accum

* change criterion

* wrap tqdm

* change grad_accum to grad_checkpoint

* fix pbar
2023-09-07 17:38:45 +08:00
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630)
* set optimizer to optional in execute_pipeline

* arrange device and mixed precision in booster init

* fix execute_pipeline in booster.py
2023-09-07 10:42:59 +08:00
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer 2023-09-05 21:54:08 +08:00
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603) 2023-09-05 21:53:10 +08:00
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560)
* [legacy] move engine to legacy

* [example] fix seq parallel example

* [example] fix seq parallel example

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [example] update seq parallel requirements
2023-09-05 21:53:10 +08:00
Hongxin Liu 89fe027787 [legacy] move trainer to legacy (#4545)
* [legacy] move trainer to legacy

* [doc] update docs related to trainer

* [test] ignore legacy test
2023-09-05 21:53:10 +08:00
flybird11111 ec0866804c
[shardformer] update shardformer readme (#4617)
[shardformer] update shardformer readme

[shardformer] update shardformer readme
2023-09-05 13:14:41 +08:00
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer 2023-09-04 23:43:13 +08:00
flybird11111 0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin (#4584)
* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] fix epoch change

* [shardformer] broadcast add pp group

* [shardformer] fix opt test hanging

* fix

* test

* test

* [shardformer] zero1+pp and the corresponding tests (#4517)

* pause

* finish pp+zero1

* Update test_shard_vit.py

* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)

* fix overlap bug and support bert, add overlap as an option in shardconfig

* support overlap for chatglm and bloom

* [shardformer] fix emerged bugs after updating transformers (#4526)

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] Add overlap support for gpt2 (#4535)

* add overlap support for gpt2

* remove unused code

* remove unused code

* [shardformer] support pp+tp+zero1 tests (#4531)

* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] fix submodule replacement bug when enabling pp (#4544)

* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)

* implement sharded optimizer saving

* add more param info

* finish implementation of sharded optimizer saving

* fix bugs in optimizer sharded saving

* add pp+zero test

* param group loading

* greedy loading of optimizer

* fix bug when loading

* implement optimizer sharded saving

* add optimizer test & arrange checkpointIO utils

* fix gemini sharding state_dict

* add verbose option

* add loading of master params

* fix typehint

* fix master/working mapping in fp16 amp

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] fix epoch change

* [shardformer] broadcast add pp group

* rebase feature/shardformer

* update pipeline

* [shardformer] fix

* [shardformer] fix

* [shardformer] bert finetune fix

* [shardformer] add all_reduce operation to loss

add all_reduce operation to loss

* [shardformer] make compatible with pytree.

make compatible with pytree.

* [shardformer] disable tp

disable tp

* [shardformer] add 3d plugin to ci test

* [shardformer] update num_microbatches to None

* [shardformer] update microbatchsize

* [shardformer] update assert

* update scheduler

* update scheduler

---------

Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
2023-09-04 21:46:29 +08:00
binmakeswell 8d7b02290f
[doc] add llama2 benchmark (#4604)
* [doc] add llama2 benchmark

* [doc] add llama2 benchmark
2023-09-04 13:49:33 +08:00
Tian Siyuan f1ae8c9104
[example] change accelerate version (#4431)
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
2023-08-30 22:56:13 +08:00
ChengDaqi2023 8e2e1992b8
[example] update streamlit 0.73.1 to 1.11.1 (#4386) 2023-08-30 22:54:45 +08:00
Hongxin Liu 0b00def881
[example] add llama2 example (#4527)
* [example] transfer llama-1 example

* [example] fit llama-2

* [example] refactor scripts folder

* [example] fit new gemini plugin

* [cli] fix multinode runner

* [example] fit gemini optim checkpoint

* [example] refactor scripts

* [example] update requirements

* [example] update requirements

* [example] rename llama to llama2

* [example] update readme and pretrain script

* [example] refactor scripts
2023-08-28 17:59:11 +08:00
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
* [gemini] remove distributed-related part from colotensor (#4379)

* [gemini] remove process group dependency

* [gemini] remove tp part from colo tensor

* [gemini] patch inplace op

* [gemini] fix param op hook and update tests

* [test] remove useless tests

* [test] remove useless tests

* [misc] fix requirements

* [test] fix model zoo

* [test] fix model zoo

* [test] fix model zoo

* [test] fix model zoo

* [test] fix model zoo

* [misc] update requirements

* [gemini] refactor gemini optimizer and gemini ddp (#4398)

* [gemini] update optimizer interface

* [gemini] renaming gemini optimizer

* [gemini] refactor gemini ddp class

* [example] update gemini related example

* [example] update gemini related example

* [plugin] fix gemini plugin args

* [test] update gemini ckpt tests

* [gemini] fix checkpoint io

* [example] fix opt example requirements

* [example] fix opt example

* [example] fix opt example

* [example] fix opt example

* [gemini] add static placement policy (#4443)

* [gemini] add static placement policy

* [gemini] fix param offload

* [test] update gemini tests

* [plugin] update gemini plugin

* [plugin] update gemini plugin docstr

* [misc] fix flash attn requirement

* [test] fix gemini checkpoint io test

* [example] update resnet example result (#4457)

* [example] update bert example result (#4458)

* [doc] update gemini doc (#4468)

* [example] update gemini related examples (#4473)

* [example] update gpt example

* [example] update dreambooth example

* [example] update vit

* [example] update opt

* [example] update palm

* [example] update vit and opt benchmark

* [hotfix] fix bert in model zoo (#4480)

* [hotfix] fix bert in model zoo

* [test] remove chatglm gemini test

* [test] remove sam gemini test

* [test] remove vit gemini test

* [hotfix] fix opt tutorial example (#4497)

* [hotfix] fix opt tutorial example

* [hotfix] fix opt tutorial example
2023-08-24 09:29:25 +08:00
Tian Siyuan ff836790ae
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430)
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
2023-08-15 00:22:57 +08:00
binmakeswell 089c365fa0
[doc] add Series A Funding and NeurIPS news (#4377)
* [doc] add Series A Funding and NeurIPS news

* [kernal] fix mha kernal

* [CI] skip moe

* [CI] fix requirements
2023-08-04 17:42:07 +08:00
caption 16c0acc01b
[hotfix] update gradio 3.11 to 3.34.0 (#4329) 2023-08-01 16:25:25 +08:00
binmakeswell ef4b99ebcd add llama example CI 2023-07-26 14:12:57 +08:00
binmakeswell 7ff11b5537
[example] add llama pretraining (#4257) 2023-07-17 21:07:44 +08:00
github-actions[bot] 4e9b09c222
Automated submodule synchronization (#4217)
Co-authored-by: github-actions <github-actions@github.com>
2023-07-12 17:35:58 +08:00
digger yu 2d40759a53
fix #3852 path error (#4058) 2023-06-28 15:29:44 +08:00
Jianghai 31dc302017
[examples] copy resnet example to image (#4090)
* copy resnet example

* add pytest package

* skip test_ci

* skip test_ci

* skip test_ci
2023-06-27 16:40:46 +08:00
Baizhou Zhang 4da324cd60
[hotfix]fix argument naming in docs and examples (#4083) 2023-06-26 23:50:04 +08:00
LuGY 160c64c645
[example] fix bucket size in example of gpt gemini (#4028) 2023-06-19 11:22:42 +08:00
Baizhou Zhang b3ab7fbabf
[example] update ViT example using booster api (#3940) 2023-06-12 15:02:27 +08:00
Liu Ziming e277534a18
Merge pull request #3905 from MaruyamaAya/dreambooth
[example] Adding an example of training dreambooth with the new booster API
2023-06-09 08:44:18 +08:00
digger yu 33eef714db
fix typo examples and docs (#3932) 2023-06-08 16:09:32 +08:00
Maruyama_Aya 9b5e7ce21f modify shell for check 2023-06-08 14:56:56 +08:00
digger yu 407aa48461
fix typo examples/community/roberta (#3925) 2023-06-08 14:28:34 +08:00
Maruyama_Aya 730a092ba2 modify shell for check 2023-06-08 13:38:18 +08:00
Maruyama_Aya 49567d56d1 modify shell for check 2023-06-08 13:36:05 +08:00
Maruyama_Aya 039854b391 modify shell for check 2023-06-08 13:17:58 +08:00
Baizhou Zhang e417dd004e
[example] update opt example using booster api (#3918) 2023-06-08 11:27:05 +08:00
Maruyama_Aya cf4792c975 modify shell for check 2023-06-08 11:15:10 +08:00
Maruyama_Aya c94a33579b modify shell for check 2023-06-07 17:23:01 +08:00
Liu Ziming b306cecf28
[example] Modify palm example with the new booster API (#3913)
* Modify torch version requirement to adapt torch 2.0

* modify palm example using new booster API

* roll back

* fix port

* polish

* polish
2023-06-07 16:05:00 +08:00
wukong1992 a55fb00c18
[booster] update bert example, using booster api (#3885) 2023-06-07 15:51:00 +08:00
Maruyama_Aya 4fc8bc68ac modify file path 2023-06-07 11:02:19 +08:00
Maruyama_Aya b4437e88c3 fixed port 2023-06-06 16:21:38 +08:00
Maruyama_Aya 79c9f776a9 fixed port 2023-06-06 16:20:45 +08:00
Maruyama_Aya d3379f0be7 fixed model saving bugs 2023-06-06 16:07:34 +08:00
Maruyama_Aya b29e1f0722 change directory 2023-06-06 15:50:03 +08:00
Maruyama_Aya 1c1f71cbd2 fixing insecure hash function 2023-06-06 14:51:11 +08:00
Maruyama_Aya b56c7f4283 update shell file 2023-06-06 14:09:27 +08:00
Maruyama_Aya 176010f289 update performance evaluation 2023-06-06 14:08:22 +08:00
Maruyama_Aya 25447d4407 modify path 2023-06-05 11:47:07 +08:00
Maruyama_Aya 60ec33bb18 Add a new example of Dreambooth training using the booster API 2023-06-02 16:50:51 +08:00
jiangmingyan 5f79008c4a
[example] update gemini examples (#3868)
* [example]update gemini examples

* [example]update gemini examples
2023-05-30 18:41:41 +08:00
digger yu 518b31c059
[docs] change placememt_policy to placement_policy (#3829)
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.

* fix typo colossalai/auto_parallel autochunk fx/passes  etc.

* fix typo docs/

* change placememt_policy to placement_policy in docs/ and examples/
2023-05-24 14:51:49 +08:00
github-actions[bot] 62c7e67f9f
[format] applied code formatting on changed files in pull request 3786 (#3787)
Co-authored-by: github-actions <github-actions@github.com>
2023-05-22 14:42:09 +08:00
binmakeswell ad2cf58f50
[chat] add performance and tutorial (#3786) 2023-05-19 18:03:56 +08:00
binmakeswell 15024e40d9
[auto] fix install cmd (#3772) 2023-05-18 13:33:01 +08:00
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707)
* fix spelling error with examples/comminity/

* fix spelling error with tests/

* fix some spelling error with tests/ colossalai/ etc.
2023-05-10 17:12:03 +08:00
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706)
* [booster] add prepare dataloader method for plug

* [booster] update examples and docstr
2023-05-08 15:44:03 +08:00
Hongxin Liu f83ea813f5
[example] add train resnet/vit with booster example (#3694)
* [example] add train vit with booster example

* [example] update readme

* [example] add train resnet with booster example

* [example] enable ci

* [example] enable ci

* [example] add requirements

* [hotfix] fix analyzer init

* [example] update requirements
2023-05-08 10:42:30 +08:00
Hongxin Liu d556648885
[example] add finetune bert with booster example (#3693) 2023-05-06 11:53:13 +08:00
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
* Fixed several spelling errors under colossalai

* Fix the spelling error in colossalai and docs directory

* Cautious Changed the spelling error under the example folder

* Update runtime_preparation_pass.py

revert autograft to autograd

* Update search_chunk.py

utile to until

* Update check_installation.py

change misteach to mismatch in line 91

* Update 1D_tensor_parallel.md

revert to perceptron

* Update 2D_tensor_parallel.md

revert to perceptron in line 73

* Update 2p5D_tensor_parallel.md

revert to perceptron in line 71

* Update 3D_tensor_parallel.md

revert to perceptron in line 80

* Update README.md

revert to resnet in line 42

* Update reorder_graph.py

revert to indice in line 7

* Update p2p.py

revert to megatron in line 94

* Update initialize.py

revert to torchrun in line 198

* Update routers.py

change to detailed in line 63

* Update routers.py

change to detailed in line 146

* Update README.md

revert  random number in line 402
2023-04-26 11:38:43 +08:00
github-actions[bot] d544ed4345
[bot] Automated submodule synchronization (#3596)
Co-authored-by: github-actions <github-actions@github.com>
2023-04-19 10:38:12 +08:00
digger-yu d0fbd4b86f
[example] fix community doc (#3586)
Adjusted the style of Community Examples to be consistent with other titles
2023-04-18 10:37:34 +08:00
binmakeswell f1b3d60cae
[example] reorganize for community examples (#3557) 2023-04-14 16:27:48 +08:00
natalie_cao de84c0311a Polish Code 2023-04-12 18:19:46 +08:00
binmakeswell 0c0455700f
[doc] add requirement and highlight application (#3516)
* [doc] add requirement and highlight application

* [doc] link example and application
2023-04-10 17:37:16 +08:00
mandoxzhang 8f2c55f9c9
[example] remove redundant texts & update roberta (#3493)
* update roberta example

* update roberta example

* modify conflict & update roberta
2023-04-07 11:33:32 +08:00
mandoxzhang ab5fd127e3
[example] update roberta with newer ColossalAI (#3472)
* update roberta example

* update roberta example
2023-04-07 10:34:51 +08:00
NatalieC323 fb8fae6f29
Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) 2023-04-06 20:22:52 +08:00
NatalieC323 c701b77b11
[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)
* Update requirements.txt

* Update environment.yaml

* Update README.md

* Update environment.yaml

* Update README.md

* Update README.md

* Delete requirements_colossalai.txt

* Update requirements.txt

* Update README.md
2023-04-06 17:50:52 +08:00
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
* [test] added spawn decorator

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-04-06 14:51:35 +08:00
Frank Lee 7d8d825681
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) 2023-04-06 09:43:51 +08:00
ver217 573af84184
[example] update examples related to zero/gemini (#3431)
* [zero] update legacy import

* [zero] update examples

* [example] fix opt tutorial

* [example] fix opt tutorial

* [example] fix opt tutorial

* [example] fix opt tutorial

* [example] fix import
2023-04-04 17:32:51 +08:00
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
* [zero] refactor low-level zero folder structure

* [zero] fix legacy zero import path

* [zero] fix legacy zero import path

* [zero] remove useless import

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] fix test import path

* [zero] fix test

* [zero] fix circular import

* [zero] update import
2023-04-04 13:48:16 +08:00
Jan Roudaut dd367ce795
[doc] polish diffusion example (#3386)
* [examples/images/diffusion]: README.md: typo fixes

* Update README.md

* Grammar fixes

* Reformulated "Step 3" (xformers) introduction

to the cost => at the cost + reworded pip availability.
2023-04-01 23:09:40 +08:00
Jan Roudaut 51cd2fec57
Typofix: malformed `xformers` version (#3384)
s/0.12.0/0.0.12/
2023-03-31 23:32:44 +08:00
YuliangLiu0306 fd6add575d
[examples] polish AutoParallel readme (#3270) 2023-03-28 10:40:07 +08:00
Frank Lee 73d3e4d309
[booster] implemented the torch ddd + resnet example (#3232)
* [booster] implemented the torch ddd + resnet example

* polish code
2023-03-27 10:24:14 +08:00
NatalieC323 280fcdc485
polish code (#3194)
Co-authored-by: YuliangLiu0306 <72588413+YuliangLiu0306@users.noreply.github.com>
2023-03-24 18:44:43 +08:00
Yan Fang 189347963a
[auto] fix requirements typo for issue #3125 (#3209) 2023-03-23 10:22:08 +08:00
NatalieC323 e5f668f280
[dreambooth] fixing the incompatibity in requirements.txt (#3190)
* Update requirements.txt

* Update environment.yaml

* Update README.md

* Update environment.yaml

* Update README.md

* Update README.md

* Delete requirements_colossalai.txt

* Update requirements.txt

* Update README.md
2023-03-21 16:01:13 +08:00
Zihao 18dbe76cae
[auto-parallel] add auto-offload feature (#3154)
* add auto-offload feature

* polish code

* fix syn offload runtime pass bug

* add offload example

* fix offload testing bug

* fix example testing bug
2023-03-21 14:17:41 +08:00
NatalieC323 4e921cfbd6
[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170)
* Update requirements.txt

* Update environment.yaml

* Update README.md

* Update environment.yaml
2023-03-20 14:19:05 +08:00
binmakeswell 3c01280a56
[doc] add community contribution guide (#3153)
* [doc] update contribution guide

* [doc] update contribution guide

* [doc] add community contribution guide
2023-03-17 11:07:24 +08:00
github-actions[bot] 0aa92c0409
Automated submodule synchronization (#3105)
Co-authored-by: github-actions <github-actions@github.com>
2023-03-13 08:58:06 +08:00
binmakeswell 018936a3f3
[tutorial] update notes for TransformerEngine (#3098) 2023-03-10 16:30:52 +08:00
Kirthi Shankar Sivamani 65a4dbda6c
[NVIDIA] Add FP8 example using TE (#3080)
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
2023-03-10 16:24:08 +08:00
Fazzie-Maqianli 5d5f475d75
[diffusers] fix ci and docker (#3085) 2023-03-10 10:35:15 +08:00
Camille Zhong e58a3c804c
Fix the version of lightning and colossalai in Stable Diffusion environment requirement (#3073)
1. Modify the README of stable diffusion
2. Fix the version of pytorch lightning&lightning and colossalai version to enable codes running successfully.
2023-03-10 09:55:58 +08:00
binmakeswell 360674283d
[example] fix redundant note (#3065) 2023-03-09 10:59:28 +08:00
Tomek af3888481d
[example] fixed opt model downloading from huggingface 2023-03-09 10:47:41 +08:00
ramos 2ef855c798
support shardinit option to avoid OPT OOM initializing problem (#3037)
Co-authored-by: poe <poe@nemoramo>
2023-03-08 13:45:15 +08:00
Ziyue Jiang 400f63012e
[pipeline] Add Simplified Alpa DP Partition (#2507)
* add alpa dp split

* add alpa dp split

* use fwd+bwd instead of fwd only

---------

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-03-07 10:34:31 +08:00
binmakeswell 52a5078988
[doc] add ISC tutorial (#2997)
* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial
2023-03-06 10:36:38 +08:00
github-actions[bot] 827a0af8cc
Automated submodule synchronization (#2982)
Co-authored-by: github-actions <github-actions@github.com>
2023-03-03 10:55:45 +08:00
github-actions[bot] da056285f2
[format] applied code formatting on changed files in pull request 2922 (#2923)
Co-authored-by: github-actions <github-actions@github.com>
2023-02-27 19:29:06 +08:00
binmakeswell 12bafe057f
[doc] update installation for GPT (#2922) 2023-02-27 18:28:34 +08:00
binmakeswell 0afb55fc5b
[doc] add os scope, update tutorial install and tips (#2914) 2023-02-27 14:59:27 +08:00
Alex_996 a4fc125c34
Fix typos (#2863)
Fix typos, `6.7 -> 6.7b`
2023-02-22 10:59:48 +08:00
dawei-wang 55424a16a5
[doc] fix GPT tutorial (#2860)
Fix hpcaitech/ColossalAI#2851
2023-02-22 10:58:52 +08:00
Zheng Zeng 597914317b
[doc] fix typo in opt inference tutorial (#2849) 2023-02-21 17:16:13 +08:00
github-actions[bot] a5721229d9
Automated submodule synchronization (#2740)
Co-authored-by: github-actions <github-actions@github.com>
2023-02-20 17:35:46 +08:00
Haofan Wang 47ecb22387
[example] add LoRA support (#2821)
* add lora

* format
2023-02-20 16:23:12 +08:00
Jiarui Fang bf0204604f
[exmaple] add bert and albert (#2824) 2023-02-20 10:35:55 +08:00
Fazzie-Maqianli ba84cd80b2
fix pip install colossal (#2764) 2023-02-17 09:54:21 +08:00
cloudhuang 43dffdaba5
[doc] fixed a typo in GPT readme (#2736) 2023-02-15 22:24:45 +08:00
Fazzie-Maqianli d03f4429c1
add ci (#2641) 2023-02-15 09:55:53 +08:00
github-actions[bot] d701ef81b1
Automated submodule synchronization (#2707)
Co-authored-by: github-actions <github-actions@github.com>
2023-02-15 09:39:44 +08:00
github-actions[bot] 88416019e7
Automated submodule synchronization (#2648)
Co-authored-by: github-actions <github-actions@github.com>
2023-02-13 18:10:54 +08:00
binmakeswell 9ab14b20b5
[doc] add CVPR tutorial (#2666) 2023-02-10 20:43:34 +08:00
Jiatong (Julius) Han a255a38f7f
[example] Polish README.md (#2658)
* [tutorial] polish readme.md

* [example] Update README.md
2023-02-09 20:43:55 +08:00
Fazzie-Maqianli 292c81ed7c
fix/transformer-verison (#2581) 2023-02-08 13:50:27 +08:00
Frank Lee 4ae02c4b1c
[tutorial] added energonai to opt inference requirements (#2625) 2023-02-07 16:58:06 +08:00
binmakeswell 0556f5d468
[tutorial] add video link (#2619) 2023-02-07 15:14:51 +08:00
github-actions[bot] ae86be1fd2
Automated submodule synchronization (#2607)
Co-authored-by: github-actions <github-actions@github.com>
2023-02-07 09:33:27 +08:00
binmakeswell 039b0c487b
[tutorial] polish README (#2568) 2023-02-04 17:49:52 +08:00
oahzxl 4f5ef73a43
[tutorial] update fastfold tutorial (#2565)
* update readme

* update

* update
2023-02-03 16:54:28 +08:00
Fazzie-Maqianli 79079a9d0c
Merge pull request #2561 from Fazziekey/v2
bug/fix diffusion ckpt problem
2023-02-03 15:42:49 +08:00
Fazzie cad1f50512 fix ckpt 2023-02-03 15:39:59 +08:00
YuliangLiu0306 f477a14f4a
[hotfix] fix autoparallel demo (#2533) 2023-01-31 17:42:45 +08:00
HELSON 6e0faa70e0
[gemini] add profiler in the demo (#2534) 2023-01-31 14:21:22 +08:00
Fazzie f35326881c fix README 2023-01-31 10:51:13 +08:00
HELSON 66dfcf5281
[gemini] update the gpt example (#2527) 2023-01-30 17:58:05 +08:00
LuGY ecbad93b65
[example] Add fastfold tutorial (#2528)
* add fastfold example

* pre-commit polish

* pre-commit polish readme and add empty test ci

* Add test_ci and reduce the default sequence length
2023-01-30 17:08:18 +08:00
Jiarui Fang fd8d19a6e7
[example] update lightning dependency for stable diffusion (#2522) 2023-01-29 13:52:15 +08:00
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
jiaruifang 32390cbe8f add test_ci.sh to dreambooth 2023-01-19 09:46:28 +08:00
jiaruifang 025b482dc1 [example] dreambooth example 2023-01-18 18:42:56 +08:00
jiaruifang e58cc441e2 polish code and fix dataloader bugs 2023-01-18 12:00:08 +08:00
jiaruifang a4b75b78a0 [hotfix] gpt example titans bug #2493 2023-01-18 11:37:16 +08:00
binmakeswell fcc6d61d92
[example] fix requirements (#2488) 2023-01-17 13:07:25 +08:00
Jiarui Fang 3a21485ead
[example] titans for gpt (#2484) 2023-01-16 15:55:41 +08:00
Jiarui Fang 7c31706227
[CI] add test_ci.sh for palm, opt and gpt (#2475) 2023-01-16 14:44:29 +08:00
Jiarui Fang e4c38ba367
[example] stable diffusion add roadmap (#2482) 2023-01-16 12:14:49 +08:00
ver217 f525d1f528
[example] update gpt gemini example ci test (#2477) 2023-01-13 22:37:31 +08:00
Ziyue Jiang fef5c949c3
polish pp middleware (#2476)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-13 16:56:01 +08:00
Frank Lee 8b7495dd54
[example] integrate seq-parallel tutorial with CI (#2463) 2023-01-13 14:40:05 +08:00
ver217 8e85d2440a
[example] update vit ci script (#2469)
* [example] update vit ci script

* [example] update requirements

* [example] update requirements
2023-01-13 13:31:27 +08:00
Jiarui Fang 867c8c2d3a
[zero] low level optim supports ProcessGroup (#2464) 2023-01-13 10:05:58 +08:00
Frank Lee e6943e2d11
[example] integrate autoparallel demo with CI (#2466)
* [example] integrate autoparallel demo with CI

* polish code

* polish code

* polish code

* polish code
2023-01-12 16:26:42 +08:00
YuliangLiu0306 c20529fe78
[examples] update autoparallel tutorial demo (#2449)
* [examples] update autoparallel tutorial demo

* add test_ci.sh

* polish

* add conda yaml
2023-01-12 14:30:58 +08:00
Haofan Wang cfd1d5ee49
[example] fixed seed error in train_dreambooth_colossalai.py (#2445) 2023-01-11 16:56:15 +08:00
Frank Lee ac18a445fa
[example] updated large-batch optimizer tutorial (#2448)
* [example] updated large-batch optimizer tutorial

* polish code

* polish code
2023-01-11 16:27:31 +08:00
Frank Lee 39163417a1
[example] updated the hybrid parallel tutorial (#2444)
* [example] updated the hybrid parallel tutorial

* polish code
2023-01-11 15:17:17 +08:00
YuliangLiu0306 2731531bc2
[autoparallel] integrate device mesh initialization into autoparallelize (#2393)
* [autoparallel] integrate device mesh initialization into autoparallelize

* add megatron solution

* update gpt autoparallel examples with latest api

* adapt beta value to fit the current computation cost
2023-01-11 14:03:49 +08:00
Frank Lee a3e5496156
[example] improved the clarity yof the example readme (#2427)
* [example] improved the clarity yof the example readme

* polish workflow

* polish workflow

* polish workflow

* polish workflow

* polish workflow

* polish workflow
2023-01-11 10:46:32 +08:00
Frank Lee 63be79d505
[example] removed duplicated stable diffusion example (#2424) 2023-01-11 10:07:18 +08:00
ZijianYY fe0f7970a2
[examples] adding tflops to PaLM (#2365) 2023-01-10 16:18:56 +08:00
HELSON d84e747975
[hotfix] add DISTPAN argument for benchmark (#2412)
* change the benchmark config file

* change config

* revert config file

* rename distpan to distplan
2023-01-10 11:39:25 +08:00
Frank Lee 8327932d2c
[workflow] refactored the example check workflow (#2411)
* [workflow] refactored the example check workflow

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-10 11:26:19 +08:00
HELSON 498b5ca993
[hotfix] fix gpt gemini example (#2404)
* [hotfix] fix gpt gemini example

* [example] add new assertions
2023-01-09 15:52:17 +08:00
jiaruifang b2e0d502b8 [doc] hotfix #2377 2023-01-07 19:44:50 +08:00
Jiarui Fang 8f72b6f8fb
[hotfix] fix implement error in diffusers 2023-01-07 07:56:39 +08:00
1SAA 33f3023e19 [hotfix] fix implement error in diffusers 2023-01-06 18:37:18 +08:00
Jiarui Fang 12c8bf38d7
[Pipeline] Refine GPT PP Example 2023-01-06 18:03:45 +08:00
Ziyue Jiang ad00894f7f polish 2023-01-06 16:03:16 +08:00
Jiarui Fang 1aaeb596c6
[example] gpt, shard init on all processes (#2366) 2023-01-06 15:44:50 +08:00
Ziyue Jiang 3a15b20421 Move GPT PP Example 2023-01-06 14:48:58 +08:00
HELSON 48d33b1b17
[gemini] add get static torch model (#2356) 2023-01-06 13:41:19 +08:00
Fazzie-Maqianli 7a332b1734
Merge pull request #2338 from haofanwang/patch-1
Fix a typo in train_dreambooth_colossalai.py
2023-01-06 11:50:18 +08:00
YuliangLiu0306 8b1e0dfd80
[example] upload auto parallel gpt2 demo (#2354) 2023-01-06 11:38:38 +08:00
Jiarui Fang 00a9c781fd
[example] add google doc for benchmark results of GPT (#2355) 2023-01-06 11:38:15 +08:00
Jiarui Fang 509a87f3ff
[example] make gpt example directory more clear (#2353) 2023-01-06 11:11:26 +08:00
Ikko Eltociear Ashimine 5e4bced0a3
[NFC] Update roberta/README.md (#2350) 2023-01-06 10:09:14 +08:00
Jiarui Fang 35e22be2f6
[example] simplify opt example (#2344) 2023-01-06 10:08:41 +08:00
ziyuhuang123 7080a8edb0
[workflow]New version: Create workflow files for examples' auto check (#2298)
* [workflows]bug_repair

* [workflow]new_pr_fixing_bugs

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
binmakeswell d7352bef2c
[example] add example requirement (#2345) 2023-01-06 09:03:29 +08:00
Haofan Wang 7ce965c7cc
Update requirement_colossalai.txt (#2348) 2023-01-05 21:16:42 +08:00
ZijianYY f7fd592bf4
[examples]adding tp to PaLM (#2319) 2023-01-05 17:57:50 +08:00
Haofan Wang 9edd0aa75e
Update train_dreambooth_colossalai.py
accelerator.num_processes -> gpc.get_world_size(ParallelMode.DATA)
2023-01-05 15:49:57 +08:00
Fazzie-Maqianli 89f26331e9
[example] diffusion update diffusion,Dreamblooth (#2329) 2023-01-05 11:23:26 +08:00
binmakeswell e512ca9c24
[doc] update stable diffusion link (#2322)
* [doc] update link
2023-01-04 19:38:06 +08:00
Fazzie-Maqianli a9b27b9265
[exmaple] fix dreamblooth format (#2315) 2023-01-04 16:20:00 +08:00
Jiarui Fang 32253315b4
[example] update diffusion readme with official lightning (#2304) 2023-01-04 13:13:38 +08:00
HELSON e00cedd181
[example] update gemini benchmark bash (#2306) 2023-01-04 11:59:26 +08:00
binmakeswell c8144223b8
[doc] update diffusion doc (#2296) 2023-01-03 21:27:44 +08:00
ZijianYY df1d6dc553
[examples] using args and combining two versions for PaLM (#2284) 2023-01-03 17:49:00 +08:00
Ziyue Jiang ac863a01d6
[example] add benchmark (#2276)
* add benchmark

* merge common func

* add total and avg tflops

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 17:20:59 +08:00
BlueRum 1405b4381e
[example] fix save_load bug for dreambooth (#2280) 2023-01-03 17:13:29 +08:00
Jiarui Fang 879df8b943
[example] GPT polish readme (#2274) 2023-01-03 15:46:52 +08:00
Ziyue Jiang 9654df0e9a
Add GPT PP Example (#2272)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 15:17:26 +08:00
YuliangLiu0306 4b29112ab2
[autoparallel] gpt2 autoparallel examples (#2267)
* [autoparallel] gpt2 autoparallel examples

* polish code

* polish code
2023-01-03 14:23:33 +08:00
HELSON 09c0102fe6
[example] fix gpt example with 0.1.10 (#2265) 2023-01-03 13:38:14 +08:00
Fazzie-Maqianli 89f048a88a
[example] clear diffuser image (#2262) 2023-01-03 10:57:02 +08:00
Frank Lee 89542ceb44
[doc] updated the stable diffussion on docker usage (#2244)
* [doc] updated the stable diffussion on docker usage

* polish doc
2022-12-30 18:00:20 +08:00
Jiarui Fang 50cdf5430e
[example] diffusion install from docker (#2239)
* [builder] builder for scaled_upper_triang_masked_softmax

* add missing files

* fix a bug

* polish code

* [example] diffusion install from docker
2022-12-30 16:25:24 +08:00
Jiarui Fang db4cbdc7fb
[builder] builder for scaled_upper_triang_masked_softmax (#2234) 2022-12-30 09:58:00 +08:00
HELSON 31fe84237b
[example] fix benchmark.sh for gpt example (#2229) 2022-12-29 23:00:14 +08:00
Jiarui Fang 2cdecc9f38
[example] make palm + GeminiDPP work (#2227) 2022-12-29 14:28:31 +08:00
ZijianYY 63cc77173b
[example] Palm adding gemini, still has bugs (#2221) 2022-12-29 14:01:09 +08:00