Commit Graph

322 Commits (ckpt)

Author SHA1 Message Date
YeAnbang 84eab13078 update sft trainning script 2024-06-11 02:44:20 +00:00
YeAnbang 2abdede1d7 fix readme 2024-06-10 01:08:42 +00:00
YeAnbang 77db21610a replace the customized dataloader setup with the build-in one 2024-06-07 09:44:25 +00:00
YeAnbang 0d7ff10ea5 replace the customized dataloader setup with the build-in one 2024-06-07 09:43:42 +00:00
YeAnbang 790e1362a6 merge 2024-06-07 07:01:32 +00:00
YeAnbang ac1520cb8f remove baichuan from template test due to transformer version conflict 2024-06-07 07:01:32 +00:00
YeAnbang e16ccc272a update ci 2024-06-07 07:01:32 +00:00
YeAnbang 45195ac53d remove local data path 2024-06-07 07:01:31 +00:00
YeAnbang bf57b13dda remove models that require huggingface auth from ci 2024-06-07 07:01:31 +00:00
YeAnbang 0bbac158ed fix datasets version 2024-06-07 07:01:31 +00:00
YeAnbang 62eb28b929 remove duplicated test 2024-06-07 07:01:31 +00:00
YeAnbang b8b5cacf38 fix transformers version 2024-06-07 07:01:31 +00:00
pre-commit-ci[bot] 1b880ce095 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-06-07 07:01:31 +00:00
YeAnbang 7ae87b3159 fix training script 2024-06-07 07:01:31 +00:00
YeAnbang 0b4a33548c moupdate ci tests, st ci test cases passed, tp failed in generation for ppo, sp is buggy 2024-06-07 07:01:31 +00:00
YeAnbang 7e65b71815 run pre-commit 2024-06-07 07:01:30 +00:00
YeAnbang 929e1e3da4 upgrade ppo dpo rm script 2024-06-07 07:01:30 +00:00
YeAnbang 7a7e86987d upgrade colossal-chat support tp_group>1, add sp for sft 2024-06-07 07:01:30 +00:00
Tong Li 913c920ecc
[Colossal-LLaMA] Fix sft issue for llama2 (#5719)
* fix minor issue

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-15 10:52:11 +08:00
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
linsj20 91fa553775 [Feature] qlora support (#5586)
* [feature] qlora support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* qlora follow commit

* migrate qutization folder to colossalai/

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-28 10:51:27 +08:00
Tong Li 862fbaaa62
[Feature] Support LLaMA-3 CPT and ST (#5619)
* support LLaMA-3

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Run pre-commit

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-23 13:54:05 +08:00
Camille Zhong 89049b0d89
[doc] fix ColossalMoE readme (#5599)
* fix readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-15 18:06:18 +08:00
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
* [devops] remove post commit ci

* [misc] run pre-commit on all files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-08 15:09:40 +08:00
digger yu a799ca343b
[fix] fix typo s/muiti-node /multi-node etc. (#5448) 2024-04-07 18:42:15 +08:00
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`

* feat: apply `GradientCheckpointConfig` to policy and llama_forward

* feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager

* fix: add optional args for `distribute_layer` and `get_stage_index`

* fix: fix changed API calls

* test: update llama tests

* style: polish `GradientCheckpointConfig`

* fix: fix pipeline utils tests
2024-04-01 11:34:58 +08:00
YeAnbang df5e9c53cf
[ColossalChat] Update RLHF V2 (#5286)
* Add dpo. Fix sft, ppo, lora. Refactor all

* fix and tested ppo

* 2 nd round refactor

* add ci tests

* fix ci

* fix ci

* fix readme, style

* fix readme style

* fix style, fix benchmark

* reproduce benchmark result, remove useless files

* rename to ColossalChat

* use new image

* fix ci workflow

* fix ci

* use local model/tokenizer for ci tests

* fix ci

* fix ci

* fix ci

* fix ci timeout

* fix rm progress bar. fix ci timeout

* fix ci

* fix ci typo

* remove 3d plugin from ci temporary

* test environment

* cannot save optimizer

* support chat template

* fix readme

* fix path

* test ci locally

* restore build_or_pr

* fix ci data path

* fix benchmark

* fix ci, move ci tests to 3080, disable fast tokenizer

* move ci to 85

* support flash attention 2

* add all-in-one data preparation script. Fix colossal-llama2-chat chat template

* add hardware requirements

* move ci test data

* fix save_model, add unwrap

* fix missing bos

* fix missing bos; support grad accumulation with gemini

* fix ci

* fix ci

* fix ci

* fix llama2 chat template config

* debug sft

* debug sft

* fix colossalai version requirement

* fix ci

* add sanity check to prevent NaN loss

* fix requirements

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* update readme

* update readme

* update readme and ignore

* fix logger bug

* support parallel_output

* modify data preparation logic

* fix tokenization

* update lr

* fix inference

* run pre-commit

---------

Co-authored-by: Tong Li <tong.li352711588@gmail.com>
2024-03-29 14:12:29 +08:00
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189)
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution

* Change static methods for t5 layer distribution to member functions

* Change static methods for whisper layer distribution to member functions

* Replace whisper policy usage with self one

* Fix test case to use non-static layer distribution methods

* fix: fix typo

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-03-27 13:57:00 +08:00
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404)
* fix: simplify merge_batch

* fix: use return_outputs=False to eliminate extra memory consumption

* feat: add return_outputs warning

* style: remove `return_outputs=False` as it is the default value
2024-03-25 12:31:09 +08:00
binmakeswell d158fc0e64
[doc] update open-sora demo (#5479)
* [doc] update open-sora demo

* [doc] update open-sora demo

* [doc] update open-sora demo
2024-03-20 16:08:41 +08:00
digger yu 385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. (#5429) 2024-03-12 11:25:16 +08:00
Camille Zhong da885ed540
fix tensor data update for gemini loss caluculation (#5442) 2024-03-11 13:49:58 +08:00
Camille Zhong 743e7fad2f
[colossal-llama2] add stream chat examlple for chat version model (#5428)
* add stream chat for chat version

* remove os.system clear

* modify function name
2024-03-07 14:58:56 +08:00
hugo-syn c8003d463b
[doc] Fix typo s/infered/inferred/ (#5288)
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2024-03-05 22:02:08 +08:00
Dongruixuan Li a7ae2b5b4c
[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) 2024-03-05 21:48:55 +08:00
binmakeswell 822241a99c
[doc] sora release (#5425)
* [doc] sora release

* [doc] sora release

* [doc] sora release

* [doc] sora release
2024-03-05 12:08:58 +08:00
Camille Zhong 4b8312c08e
fix sft single turn inference example (#5416) 2024-03-01 17:27:50 +08:00
Tong Li a28c971516
update requirements (#5407) 2024-02-28 17:46:27 +08:00
CZYCW b833153fd5
[hotfix] fix variable type for top_p (#5313)
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-02-19 18:25:44 +08:00
Hongxin Liu 7303801854
[llama] fix training and inference scripts (#5384)
* [llama] refactor inference example to fit sft

* [llama] fix training script to fit gemini

* [llama] fix inference script
2024-02-19 16:41:04 +08:00
Frank Lee efef43b53c
Merge pull request #5372 from hpcaitech/exp/mixtral 2024-02-08 16:30:05 +08:00
Hongxin Liu 65e5d6baa5 [moe] fix mixtral optim checkpoint (#5344) 2024-02-07 19:21:02 +08:00
Hongxin Liu 956b561b54 [moe] fix mixtral forward default value (#5329) 2024-02-07 19:21:02 +08:00
Hongxin Liu b60be18dcc [moe] fix mixtral checkpoint io (#5314) 2024-02-07 19:21:02 +08:00
Hongxin Liu da39d21b71 [moe] support mixtral (#5309)
* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer
2024-02-07 19:21:02 +08:00
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253)
* [moe] top2 allow uneven input

* [moe] update capacity computing

* [moe] remove debug info

* [moe] update capacity computing

* [moe] update capacity computing
2024-02-07 19:21:02 +08:00
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 2024-02-07 19:21:02 +08:00
Hongxin Liu 084c91246c
[llama] fix memory issue (#5371)
* [llama] fix memory issue

* [llama] add comment
2024-02-06 19:02:37 +08:00
Hongxin Liu eb4f2d90f9
[llama] polish training script and fix optim ckpt (#5368) 2024-02-06 11:52:17 +08:00
Camille Zhong a5756a8720
[eval] update llama npu eval (#5366) 2024-02-06 10:53:03 +08:00
Camille Zhong 44ca61a22b
[llama] fix neftune & pbar with start_step (#5364) 2024-02-05 18:04:23 +08:00
Hongxin Liu a4cec1715b
[llama] add flash attn patch for npu (#5362) 2024-02-05 16:48:34 +08:00
Hongxin Liu 73f9f23fc6
[llama] update training script (#5360)
* [llama] update training script

* [doc] polish docstr
2024-02-05 16:33:18 +08:00
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358)
* [plugin] refactor prepare dataloader

* [plugin] update train script
2024-02-05 15:14:56 +08:00
YeAnbang c5239840e6
[Chat] fix sft loss nan (#5345)
* fix script

* fix script

* fix chat nan

* fix chat nan
2024-02-01 14:25:16 +08:00
Frank Lee 8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
Feature/npu
2024-01-29 13:49:39 +08:00
李文军 ec912b1ba9
[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) 2024-01-25 13:14:48 +08:00
Desperado-Jia ddf879e2db
fix bug for mefture (#5299) 2024-01-22 22:17:54 +08:00
Michelle 32cb74493a
fix auto loading gpt2 tokenizer (#5279) 2024-01-18 14:08:29 +08:00
ver217 148469348a Merge branch 'main' into sync/npu 2024-01-18 12:05:21 +08:00
digger yu 756c400ad2
fix typo in applications/ColossalEval/README.md (#5250) 2024-01-11 17:58:38 +08:00
digger yu 41e52c1c6e
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) 2024-01-10 19:24:56 +08:00
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------

Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
binmakeswell 7bc6969ce6
[doc] SwiftInfer release (#5236)
* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release
2024-01-08 09:55:12 +08:00
github-actions[bot] 4fb4a22a72
[format] applied code formatting on changed files in pull request 5234 (#5235)
Co-authored-by: github-actions <github-actions@github.com>
2024-01-07 20:55:34 +08:00
binmakeswell b9b32b15e6
[doc] add Colossal-LLaMA-2-13B (#5234)
* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B
2024-01-07 20:53:12 +08:00
Camille Zhong 915b4652f3
[doc] Update README.md of Colossal-LLAMA2 (#5233)
* Update README.md

* Update README.md
2024-01-06 17:06:41 +08:00
Tong Li d992b55968
[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224)
* update readme

* update readme

* update link

* update

* update readme

* update

* update

* update

* update title

* update example

* update example

* fix content

* add conclusion

* add license

* update

* update

* update version

* fix minor
2024-01-05 17:24:26 +08:00
Yuanchen eae01b6740
Improve logic for selecting metrics (#5196)
Co-authored-by: Xu <yuanchen.xu00@gmail.com>
2023-12-22 14:52:50 +08:00
BlueRum af952673f7
polish readme in application/chat (#5194) 2023-12-20 11:28:39 +08:00
Yuanchen 3ff60d13b0
Fix ColossalEval (#5186)
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
2023-12-15 15:06:06 +08:00
Yuanchen cefdc32615
[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169)
* Support GSM, Data Leakage Evaluation and Tensor Parallel

* remove redundant code and update inference.py in examples/gpt_evaluation

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
2023-12-12 14:47:35 +08:00
Michelle b07a6f4e27
[colossalqa] fix pangu api (#5170)
* fix pangu api

* add comment
2023-12-11 14:08:11 +08:00
Yuanchen b397104438
[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878)
* Add finetuning Colossal-Llama-2 example

* Add finetuning Colossal-Llama-2 example 2

* Add finetuning Colossal-Llama-2 example and support NEFTuning

* Add inference example and refine neftune

* Modify readme file

* update the imports

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>
2023-12-07 14:02:03 +08:00
Michelle 368b5e3d64
[doc] fix colossalqa document (#5146)
* fix doc

* modify doc
2023-12-01 21:39:53 +08:00
Michelle c7fd9a5213
[ColossalQA] refactor server and webui & add new feature (#5138)
* refactor server and webui & add new feature

* add requirements

* modify readme and ui
2023-11-30 22:55:52 +08:00
github-actions[bot] f6731db67c
[format] applied code formatting on changed files in pull request 5115 (#5118)
Co-authored-by: github-actions <github-actions@github.com>
2023-11-29 13:39:14 +08:00
digger yu 9110406a47
fix typo change JOSNL TO JSONL etc. (#5116) 2023-11-29 11:08:32 +08:00
Zian(Andy) Zheng 7b789f4dd2 [FEATURE] Add Safety Eval Datasets to ColossalEval (#5095)
* add safetybench and cvalues(responsibility) eval dataset

* Modify code according to review suggestions

---------

Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>
2023-11-28 11:15:04 +08:00
digger yu d5661f0f25
[nfc] fix typo change directoty to directory (#5111) 2023-11-27 18:25:53 +08:00
YeAnbang e53e729d8e
[Feature] Add document retrieval QA (#5020)
* add langchain

* add langchain

* Add files via upload

* add langchain

* fix style

* fix style: remove extra space

* add pytest; modified retriever

* add pytest; modified retriever

* add tests to build_on_pr.yml

* fix build_on_pr.yml

* fix build on pr; fix environ vars

* seperate unit tests for colossalqa from build from pr

* fix container setting; fix environ vars

* commented dev code

* add incremental update

* remove stale code

* fix style

* change to sha3 224

* fix retriever; fix style; add unit test for document loader

* fix ci workflow config

* fix ci workflow config

* add set cuda visible device script in ci

* fix doc string

* fix style; update readme; refactored

* add force log info

* change build on pr, ignore colossalqa

* fix docstring, captitalize all initial letters

* fix indexing; fix text-splitter

* remove debug code, update reference

* reset previous commit

* update LICENSE update README add key-value mode, fix bugs

* add files back

* revert force push

* remove junk file

* add test files

* fix retriever bug, add intent classification

* change conversation chain design

* rewrite prompt and conversation chain

* add ui v1

* ui v1

* fix atavar

* add header

* Refactor the RAG Code and support Pangu

* Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo.

* resolved conversation. tested scripts under examples. web demo still buggy

* fix ci tests

* Some modifications to add ChatGPT api

* modify llm.py and remove unnecessary files

* Delete applications/ColossalQA/examples/ui/test_frontend_input.json

* Remove OpenAI api key

* add colossalqa

* move files

* move files

* move files

* move files

* fix style

* Add Readme and fix some bugs.

* Add something to readme and modify some code

* modify a directory name for clarity

* remove redundant directory

* Correct a type in  llm.py

* fix AI prefix

* fix test_memory.py

* fix conversation

* fix some erros and typos

* Fix a missing import in RAG_ChatBot.py

* add colossalcloud LLM wrapper, correct issues in code review

---------

Co-authored-by: YeAnbang <anbangy2@outlook.com>
Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu>
Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>
2023-11-23 10:33:48 +08:00
Orion-Zheng 43ad0d9ef0 fix wrong EOS token in ColossalChat 2023-11-14 10:49:49 +08:00
Yuanchen 239cd92eff
Support mtbench (#5025)
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
2023-11-09 13:41:50 +08:00
Yuanchen abe071b663
fix ColossalEval (#4992)
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
2023-10-31 10:30:03 +08:00
github-actions[bot] a41cf88e9b
[format] applied code formatting on changed files in pull request 4908 (#4918)
Co-authored-by: github-actions <github-actions@github.com>
2023-10-17 10:48:24 +08:00
Zian(Andy) Zheng 7768afbad0 Update flash_attention_patch.py
To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer.
https://github.com/huggingface/transformers/pull/25598
2023-10-16 14:00:45 +08:00
Camille Zhong 652adc2215 Update README.md 2023-10-10 23:19:34 +08:00
Camille Zhong afe10a85fd Update README.md 2023-10-10 23:19:34 +08:00
Camille Zhong 3043d5d676 Update modelscope link in README.md
add modelscope link
2023-10-10 23:19:34 +08:00
Tong Li ed06731e00
update Colossal (#4832) 2023-09-28 16:05:05 +08:00
binmakeswell 822051d888
[doc] update slack link (#4823) 2023-09-27 17:37:39 +08:00
Yuanchen 1fa8c5e09f
Update Qwen-7B results (#4821)
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
2023-09-27 17:33:54 +08:00
flybird11111 be400a0936
[chat] fix gemini strategy (#4698)
* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py
2023-09-27 13:15:32 +08:00
Chandler-Bing b6cf0aca55
[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)
change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing
2023-09-26 11:44:27 +08:00
Tong Li 8cbce6184d update 2023-09-26 11:36:53 +08:00
Tong Li bd014673b0 update readme 2023-09-26 10:58:05 +08:00
binmakeswell d512a4d38d
[doc] add llama2 domain-specific solution news (#4789)
* [doc] add llama2 domain-specific solution news
2023-09-25 10:44:15 +08:00
Yuanchen ce777853ae
[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)
* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
2023-09-24 23:14:11 +08:00
Tong Li 74aa7d964a
initial commit: add colossal llama 2 (#4784) 2023-09-24 23:12:26 +08:00
Wenhao Chen 901ab1eedd
[chat]: add lora merge weights config (#4766)
* feat: modify lora merge weights fn

* feat: add lora merge weights config
2023-09-21 16:23:59 +08:00