Commit Graph

3461 Commits (fbf33ecd019ce0e075b76b628e6e8a319cfc43e3)

Author SHA1 Message Date
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg

* support ignore EOS token

* change variable's name

* fix annotation
2024-05-08 19:59:10 +08:00
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg
2024-05-08 17:58:29 +08:00
Yuanheng Zhao 12e7c28d5e
[hotfix] fix OpenMOE example import path (#5697) 2024-05-08 15:48:47 +08:00
Wang Binluo 22297789ab
Merge pull request #5684 from wangbluo/parallel_output
[Shardformer] Add Parallel output for shardformer models
2024-05-07 22:59:42 -05:00
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
* clean requirements

* modify example inference struct

* add test ci scripts

* mark test_infer as submodule

* rm deprecated cls & deps

* import of HAS_FLASH_ATTN

* prune inference tests to be run

* prune triton kernel tests

* increment pytest timeout mins

* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Yuanheng Zhao f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695)
- Fix key value number assignment in KVCacheManager, as well as method of accessing
2024-05-07 23:13:14 +08:00
wangbluo 4e50cce26b fix the mistral model 2024-05-07 09:17:56 +00:00
wangbluo a8408b4d31 remove comment code 2024-05-07 07:08:56 +00:00
pre-commit-ci[bot] ca56b93d83 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-07 07:07:09 +00:00
wangbluo 108ddfb795 add parallel_output for the opt model 2024-05-07 07:05:53 +00:00
pre-commit-ci[bot] 88f057ce7c [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-07 07:03:47 +00:00
Edenzzzz 58954b2986
[misc] Add an existing issue checkbox in bug report (#5691)
Co-authored-by: Wenxuan(Eden) Tan <wtan45@wisc.edu>
2024-05-07 12:18:50 +08:00
flybird11111 77ec773388
[zero]remove registered gradients hooks (#5687)
* remove registered hooks

fix

fix

fix zero

fix

fix

fix

fix

fix zero

fix zero

fix

fix

fix

* fix

fix

fix
2024-05-07 12:01:38 +08:00
Edenzzzz c25f83c85f
fix missing pad token (#5690)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-05-06 18:17:26 +08:00
傅剑寒 1ace1065e6
[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686) 2024-05-06 15:35:13 +08:00
Yuanheng Zhao db7b3051f4
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685)
[Sync] Update from main to feature/colossal-infer

- Merge pull request #5685 from yuanheng-zhao/inference/merge/main
2024-05-06 14:43:38 +08:00
Steve Luo 725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) 2024-05-06 10:55:34 +08:00
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 2024-05-05 16:28:56 +00:00
Yuanheng Zhao 56ed09aba5 [sync] resolve conflicts of merging main 2024-05-05 05:14:00 +00:00
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
* kvmemcpy triton for new kcache layout

* revise tests for new kcache layout

* naive triton flash decoding - new kcache layout

* rotary triton kernel - new kcache layout

* remove redundancy - triton decoding

* remove redundancy - triton kvcache copy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-03 17:20:45 +08:00
wangbluo 2632916329 remove useless code 2024-05-01 09:23:43 +00:00
傅剑寒 9df016fc45
[Inference] Fix quant bits order (#5681) 2024-04-30 19:38:00 +08:00
yuehuayingxueluo f79963199c
[inference]Add alibi to flash attn function (#5678)
* add alibi to flash attn function

* rm redundant modifications
2024-04-30 19:35:05 +08:00
傅剑寒 ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680) 2024-04-30 18:33:53 +08:00
wangbluo 9efc79ef24 add parallel output for mistral model 2024-04-30 08:10:20 +00:00
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator

* refactor decode_kv_cache_memcpy

* enable alibi in pagedattention
2024-04-30 15:52:23 +08:00
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659)
* adapt to baichuan2 13B

* add baichuan2 13B TP

* update baichuan tp logic

* rm unused code

* Fix TP logic

* fix alibi slopes tp logic

* rm nn.Module

* Polished the code.

* change BAICHUAN_MODEL_NAME_OR_PATH

* Modified the logic for loading Baichuan weights.

* fix typos
2024-04-30 15:47:07 +08:00
傅剑寒 808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 (#5674) 2024-04-30 11:26:36 +08:00
Wang Binluo d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670)
* add assert for num of attention heads divisible by tp_size

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 18:47:47 +08:00
flybird11111 6af6d6fc9f
[shardformer] support bias_gelu_jit_fused for models (#5647)
* support gelu_bias_fused for gpt2

* support gelu_bias_fused for gpt2

fix

fix

fix

* fix

fix

* fix
2024-04-29 15:33:51 +08:00
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
linsj20 91fa553775 [Feature] qlora support (#5586)
* [feature] qlora support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* qlora follow commit

* migrate qutization folder to colossalai/

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-28 10:51:27 +08:00
flybird11111 8954a0c2e2 [LowLevelZero] low level zero support lora (#5153)
* low level zero support lora

low level zero support lora

* add checkpoint test

* add checkpoint test

* fix

* fix

* fix

* fix

fix

fix

fix

* fix

* fix

fix

fix

fix

fix

fix

fix

* fix

* fix

fix

fix

fix

fix

fix

fix

* fix

* test ci

* git # This is a combination of 3 commits.

Update low_level_zero_plugin.py

Update low_level_zero_plugin.py

fix

fix

fix

* fix naming

fix naming

fix naming

fix
2024-04-28 10:51:27 +08:00
Baizhou Zhang 14b0d4c7e5 [lora] add lora APIs for booster, support lora for TorchDDP (#4981)
* add apis and peft requirement

* add liscense and implement apis

* add checkpointio apis

* add torchddp fwd_bwd test

* add support_lora methods

* add checkpointio test and debug

* delete unneeded codes

* remove peft from LICENSE

* add concrete methods for enable_lora

* simplify enable_lora api

* fix requirements
2024-04-28 10:51:27 +08:00
Hongxin Liu c1594e4bad
[devops] fix release docker ci (#5665) 2024-04-27 19:11:57 +08:00
Hongxin Liu 4cfbf30a5e
[release] update version (#5654) 2024-04-27 18:59:47 +08:00
Tong Li 68ec99e946
[hotfix] add soft link to support required files (#5661) 2024-04-26 21:12:04 +08:00
傅剑寒 8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656) 2024-04-26 19:40:37 +08:00
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658)
* add context attn triton kernel - new kcache layout

* add benchmark triton

* tiny revise

* trivial - code style, comment
2024-04-26 17:51:49 +08:00
binmakeswell b8a711aa2d
[news] llama3 and open-sora v1.1 (#5655)
* [news] llama3 and open-sora v1.1

* [news] llama3 and open-sora v1.1
2024-04-26 15:36:37 +08:00
Hongxin Liu 2082852f3f
[lazyinit] skip whisper test (#5653) 2024-04-26 14:03:12 +08:00
flybird11111 8b7d535977
fix gptj (#5652) 2024-04-26 11:52:27 +08:00
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614)
* adapt to baichuan2 13B

* adapt to baichuan2 13B

* change BAICHUAN_MODEL_NAME_OR_PATH

* fix test_decoding_attn.py

* Modifications based on review comments.

* change BAICHUAN_MODEL_NAME_OR_PATH

* mv attn mask processes to test flash decoding

* mv get_alibi_slopes baichuan modeling

* fix bugs in test_baichuan.py
2024-04-25 23:11:30 +08:00
Yuanheng Zhao f342a93871
[Fix] Remove obsolete files - inference (#5650) 2024-04-25 22:04:59 +08:00
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646)
* [shardformer] refactor pipeline grad ckpt config

* [shardformer] refactor pipeline grad ckpt config

* [pipeline] fix stage manager
2024-04-25 15:19:30 +08:00
Season 7ef91606e1
[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625)
* implement thread-safety singleton

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor singleton implementation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 14:45:52 +08:00
Hongxin Liu bbb2c21f16
[shardformer] fix chatglm implementation (#5644)
* [shardformer] fix chatglm policy

* [shardformer] fix chatglm flash attn

* [shardformer] update readme

* [shardformer] fix chatglm init

* [shardformer] fix chatglm test

* [pipeline] fix chatglm merge batch
2024-04-25 14:41:17 +08:00
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
* optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x])

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 14:24:02 +08:00
flybird11111 5d88ef1aaf
[shardformer] remove useless code (#5645) 2024-04-25 13:46:39 +08:00
flybird11111 148506c828
[coloattention]modify coloattention (#5627)
* modify coloattention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

fxi

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 10:47:14 +08:00