ColossalAI

Commit Graph

Author	SHA1	Message	Date
flybird11111	451e9142b8	fix flash attn (#5209 )	2024-01-03 14:39:53 +08:00
flybird11111	365671be10	fix-test (#5210 ) fix-test fix-test	2024-01-03 14:26:13 +08:00
Hongxin Liu	7f3400b560	[devops] update torch versoin in ci (#5217 )	2024-01-03 11:46:33 +08:00
Wenhao Chen	d799a3088f	[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214 ) * fix: add fallback order option and update 1f1b * fix: fix deadlock comm in interleaved pp * test: modify p2p test	2024-01-03 11:34:49 +08:00
Wenhao Chen	3c0d82b19b	[pipeline]: support arbitrary batch size in forward_only mode (#5201 ) * fix: remove drop last in val & test dataloader * feat: add run_forward_only, support arbitrary bs * chore: modify ci script	2024-01-02 23:41:12 +08:00
flybird11111	02d2328a04	support linear accumulation fusion (#5199 ) support linear accumulation fusion support linear accumulation fusion fix	2023-12-29 18:22:42 +08:00
Zhongkai Zhao	64519eb830	[doc] Update required third-party library list for testing and torch comptibility checking (#5207 ) * doc/update requirements-test.txt * update torch-cuda compatibility check	2023-12-27 18:03:45 +08:00
Yuanchen	eae01b6740	Improve logic for selecting metrics (#5196 ) Co-authored-by: Xu <yuanchen.xu00@gmail.com>	2023-12-22 14:52:50 +08:00
Wenhao Chen	4fa689fca1	[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134 ) * test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin	2023-12-22 10:44:00 +08:00
BlueRum	af952673f7	polish readme in application/chat (#5194 )	2023-12-20 11:28:39 +08:00
flybird11111	681d9b12ef	[doc] update pytorch version in documents. (#5177 ) * fix aaa fix fix fix * fix * fix * test ci * fix ci fix * update pytorch version in documents	2023-12-15 18:16:48 +08:00
Yuanchen	3ff60d13b0	Fix ColossalEval (#5186 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-12-15 15:06:06 +08:00
flybird11111	79718fae04	[shardformer] llama support DistCrossEntropy (#5176 ) * fix aaa fix fix fix * fix * fix * test ci * fix ci fix * llama support dist-cross fix fix fix fix fix fix fix fix * fix * fix * fix fix * test ci * test ci * fix * [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) * Add finetuning Colossal-Llama-2 example * Add finetuning Colossal-Llama-2 example 2 * Add finetuning Colossal-Llama-2 example and support NEFTuning * Add inference example and refine neftune * Modify readme file * update the imports --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com> * llama support dist-cross fix fix fix fix fix fix fix fix * fix * fix * fix fix * test ci * test ci * fix * fix ci * fix ci --------- Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com> Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>	2023-12-13 01:39:14 +08:00
Yuanchen	cefdc32615	[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169 ) * Support GSM, Data Leakage Evaluation and Tensor Parallel * remove redundant code and update inference.py in examples/gpt_evaluation --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-12-12 14:47:35 +08:00
Michelle	b07a6f4e27	[colossalqa] fix pangu api (#5170 ) * fix pangu api * add comment	2023-12-11 14:08:11 +08:00
flybird11111	21aa5de00b	[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150 ) * fix aaa fix fix fix * fix * fix * test ci * fix ci fix	2023-12-08 11:10:51 +08:00
Yuanchen	b397104438	[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878 ) * Add finetuning Colossal-Llama-2 example * Add finetuning Colossal-Llama-2 example 2 * Add finetuning Colossal-Llama-2 example and support NEFTuning * Add inference example and refine neftune * Modify readme file * update the imports --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>	2023-12-07 14:02:03 +08:00
flybird11111	3dbbf83f1c	fix (#5158 ) fix	2023-12-05 14:28:36 +08:00
Michelle	368b5e3d64	[doc] fix colossalqa document (#5146 ) * fix doc * modify doc	2023-12-01 21:39:53 +08:00
Michelle	c7fd9a5213	[ColossalQA] refactor server and webui & add new feature (#5138 ) * refactor server and webui & add new feature * add requirements * modify readme and ui	2023-11-30 22:55:52 +08:00
flybird11111	2a2ec49aa7	[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135 ) * fix 3d checkpoint load when booster boost without optimizer fix 3d checkpoint load when booster boost without optimizer * test ci * revert ci * fix fix	2023-11-30 18:37:47 +08:00
github-actions[bot]	f6731db67c	[format] applied code formatting on changed files in pull request 5115 (#5118 ) Co-authored-by: github-actions <github-actions@github.com>	2023-11-29 13:39:14 +08:00
github-actions[bot]	9b36640f28	[format] applied code formatting on changed files in pull request 5124 (#5125 ) Co-authored-by: github-actions <github-actions@github.com>	2023-11-29 13:39:02 +08:00
github-actions[bot]	d10ee42f68	[format] applied code formatting on changed files in pull request 5088 (#5127 ) Co-authored-by: github-actions <github-actions@github.com>	2023-11-29 13:38:37 +08:00
digger yu	9110406a47	fix typo change JOSNL TO JSONL etc. (#5116 )	2023-11-29 11:08:32 +08:00
Frank Lee	2899cfdabf	[doc] updated paper citation (#5131 )	2023-11-29 10:47:51 +08:00
binmakeswell	177c79f2d1	[doc] add moe news (#5128 ) * [doc] add moe news * [doc] add moe news * [doc] add moe news	2023-11-28 17:44:06 +08:00
Wenhao Chen	7172459e74	[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088 ) * [shardformer] implement policy for all GPT-J models and test * [shardformer] support interleaved pipeline parallel for bert finetune * [shardformer] shardformer support falcon (#4883) * [shardformer]: fix interleaved pipeline for bert model (#5048) * [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093) * Add Mistral support for Shardformer (#5103) * [shardformer] add tests to mistral (#5105) --------- Co-authored-by: Pengtai Xu <henryxu880@gmail.com> Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com> Co-authored-by: flybird11111 <1829166702@qq.com> Co-authored-by: eric8607242 <e0928021388@gmail.com>	2023-11-28 16:54:42 +08:00
アマデウス	126cf180bc	[hotfix] fixed memory usage of shardformer module replacement (#5122 )	2023-11-28 15:38:26 +08:00
Zian(Andy) Zheng	7b789f4dd2	[FEATURE] Add Safety Eval Datasets to ColossalEval (#5095 ) * add safetybench and cvalues(responsibility) eval dataset * Modify code according to review suggestions --------- Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>	2023-11-28 11:15:04 +08:00
digger yu	d5661f0f25	[nfc] fix typo change directoty to directory (#5111 )	2023-11-27 18:25:53 +08:00
digger yu	2bdf76f1f2	fix typo change lazy_iniy to lazy_init (#5099 )	2023-11-24 19:15:59 +08:00
Xuanlei Zhao	68fcaa2225	remove duplicate import (#5100 )	2023-11-23 15:15:01 +08:00
YeAnbang	e53e729d8e	[Feature] Add document retrieval QA (#5020 ) * add langchain * add langchain * Add files via upload * add langchain * fix style * fix style: remove extra space * add pytest; modified retriever * add pytest; modified retriever * add tests to build_on_pr.yml * fix build_on_pr.yml * fix build on pr; fix environ vars * seperate unit tests for colossalqa from build from pr * fix container setting; fix environ vars * commented dev code * add incremental update * remove stale code * fix style * change to sha3 224 * fix retriever; fix style; add unit test for document loader * fix ci workflow config * fix ci workflow config * add set cuda visible device script in ci * fix doc string * fix style; update readme; refactored * add force log info * change build on pr, ignore colossalqa * fix docstring, captitalize all initial letters * fix indexing; fix text-splitter * remove debug code, update reference * reset previous commit * update LICENSE update README add key-value mode, fix bugs * add files back * revert force push * remove junk file * add test files * fix retriever bug, add intent classification * change conversation chain design * rewrite prompt and conversation chain * add ui v1 * ui v1 * fix atavar * add header * Refactor the RAG Code and support Pangu * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo. * resolved conversation. tested scripts under examples. web demo still buggy * fix ci tests * Some modifications to add ChatGPT api * modify llm.py and remove unnecessary files * Delete applications/ColossalQA/examples/ui/test_frontend_input.json * Remove OpenAI api key * add colossalqa * move files * move files * move files * move files * fix style * Add Readme and fix some bugs. * Add something to readme and modify some code * modify a directory name for clarity * remove redundant directory * Correct a type in llm.py * fix AI prefix * fix test_memory.py * fix conversation * fix some erros and typos * Fix a missing import in RAG_ChatBot.py * add colossalcloud LLM wrapper, correct issues in code review --------- Co-authored-by: YeAnbang <anbangy2@outlook.com> Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu> Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com> Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>	2023-11-23 10:33:48 +08:00
Xuanlei Zhao	3acbf6d496	[npu] add npu support for hybrid plugin and llama (#5090 ) * llama 3d * update * fix autocast	2023-11-22 19:23:21 +08:00
flybird11111	aae496631c	[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084 ) * fix flash attn * fix fix	2023-11-22 16:00:07 +08:00
Zhongkai Zhao	75af66cd81	[Hotfix] Fix model policy matching strategy in ShardFormer (#5064 ) * hotfix/Fix get model policy strategy in ShardFormer * fix bug in auto policy	2023-11-22 11:19:39 +08:00
flybird11111	4ccb9ded7d	[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085 )	2023-11-22 11:14:25 +08:00
digger yu	0d482302a1	[nfc] fix typo and author name (#5089 )	2023-11-22 10:39:01 +08:00
digger yu	fd3567e089	[nfc] fix typo in docs/ (#4972 )	2023-11-21 22:06:20 +08:00
Jun Gao	dce05da535	fix thrust-transform-reduce error (#5078 )	2023-11-21 15:09:35 +08:00
Hongxin Liu	1cd7efc520	[inference] refactor examples and fix schedule (#5077 ) * [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples	2023-11-21 10:46:03 +08:00
Bin Jia	4e3959d316	[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074 ) * fix init model with random parameters * fix example	2023-11-20 20:15:25 +08:00
github-actions[bot]	8921a73c90	[format] applied code formatting on changed files in pull request 5067 (#5072 ) Co-authored-by: github-actions <github-actions@github.com>	2023-11-20 19:46:43 +08:00
Xu Kai	fb103cfd6e	[inference] update examples and engine (#5073 ) * update examples and engine * fix choices * update example	2023-11-20 19:44:52 +08:00
Bin Jia	0c7d8bebd5	[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069 )	2023-11-20 17:15:37 +08:00
Hongxin Liu	e5ce4c8ea6	[npu] add npu support for gemini and zero (#5067 ) * [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support	2023-11-20 16:12:41 +08:00
Hongxin Liu	8d56c9c389	[misc] remove outdated submodule (#5070 )	2023-11-20 15:27:44 +08:00
Cuiqing Li (李崔卿)	bce919708f	[Kernels]added flash-decoidng of triton (#5063 ) * added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	2023-11-20 13:58:29 +08:00
Xu Kai	fd6482ad8c	[inference] Refactor inference architecture (#5057 ) * [inference] support only TP (#4998) * support only tp * enable tp * add support for bloom (#5008) * [refactor] refactor gptq and smoothquant llama (#5012) * refactor gptq and smoothquant llama * fix import error * fix linear import torch-int * fix smoothquant llama import error * fix import accelerate error * fix bug * fix import smooth cuda * fix smoothcuda * [Inference Refactor] Merge chatglm2 with pp and tp (#5023) merge chatglm with pp and tp * [Refactor] remove useless inference code (#5022) * remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2 * [Refactor] refactor policy search and quant type controlling in inference (#5035) * [Refactor] refactor policy search and quant type controling in inference * [inference] update readme (#5051) * update readme * update readme * fix architecture * fix table * fix table * [inference] udpate example (#5053) * udpate example * fix run.sh * fix rebase bug * fix some errors * update readme * add some features * update interface * update readme * update benchmark * add requirements-infer --------- Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>	2023-11-19 21:05:05 +08:00

1 2 3 4 5 ...

2928 Commits (451e9142b8b8b77ed3138fb03ad54494c3c57126) All Branches Search

2928 Commits (451e9142b8b8b77ed3138fb03ad54494c3c57126)

All Branches