yuehuayingxueluo
d482922035
[Inference] Support the logic related to ignoring EOS token ( #5693 )
...
* Adapt temperature processing logic
* add ValueError for top_p and top_k
* add GQA Test
* fix except_msg
* support ignore EOS token
* change variable's name
* fix annotation
2024-05-08 19:59:10 +08:00
yuehuayingxueluo
9c2fe7935f
[Inference]Adapt temperature processing logic ( #5689 )
...
* Adapt temperature processing logic
* add ValueError for top_p and top_k
* add GQA Test
* fix except_msg
2024-05-08 17:58:29 +08:00
Yuanheng Zhao
12e7c28d5e
[hotfix] fix OpenMOE example import path ( #5697 )
2024-05-08 15:48:47 +08:00
Wang Binluo
22297789ab
Merge pull request #5684 from wangbluo/parallel_output
...
[Shardformer] Add Parallel output for shardformer models
2024-05-07 22:59:42 -05:00
Yuanheng Zhao
55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements ( #5688 )
...
* clean requirements
* modify example inference struct
* add test ci scripts
* mark test_infer as submodule
* rm deprecated cls & deps
* import of HAS_FLASH_ATTN
* prune inference tests to be run
* prune triton kernel tests
* increment pytest timeout mins
* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Yuanheng Zhao
f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager ( #5695 )
...
- Fix key value number assignment in KVCacheManager, as well as method of accessing
2024-05-07 23:13:14 +08:00
wangbluo
4e50cce26b
fix the mistral model
2024-05-07 09:17:56 +00:00
wangbluo
a8408b4d31
remove comment code
2024-05-07 07:08:56 +00:00
pre-commit-ci[bot]
ca56b93d83
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2024-05-07 07:07:09 +00:00
wangbluo
108ddfb795
add parallel_output for the opt model
2024-05-07 07:05:53 +00:00
pre-commit-ci[bot]
88f057ce7c
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2024-05-07 07:03:47 +00:00
Edenzzzz
58954b2986
[misc] Add an existing issue checkbox in bug report ( #5691 )
...
Co-authored-by: Wenxuan(Eden) Tan <wtan45@wisc.edu>
2024-05-07 12:18:50 +08:00
flybird11111
77ec773388
[zero]remove registered gradients hooks ( #5687 )
...
* remove registered hooks
fix
fix
fix zero
fix
fix
fix
fix
fix zero
fix zero
fix
fix
fix
* fix
fix
fix
2024-05-07 12:01:38 +08:00
Edenzzzz
c25f83c85f
fix missing pad token ( #5690 )
...
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-05-06 18:17:26 +08:00
傅剑寒
1ace1065e6
[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy ( #5686 )
2024-05-06 15:35:13 +08:00
Yuanheng Zhao
db7b3051f4
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685 )
...
[Sync] Update from main to feature/colossal-infer
- Merge pull request #5685 from yuanheng-zhao/inference/merge/main
2024-05-06 14:43:38 +08:00
Steve Luo
725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 ( #5679 )
2024-05-06 10:55:34 +08:00
Yuanheng Zhao
8754abae24
[Fix] Fix & Update Inference Tests (compatibility w/ main)
2024-05-05 16:28:56 +00:00
Yuanheng Zhao
56ed09aba5
[sync] resolve conflicts of merging main
2024-05-05 05:14:00 +00:00
Yuanheng Zhao
537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel ( #5677 )
...
* kvmemcpy triton for new kcache layout
* revise tests for new kcache layout
* naive triton flash decoding - new kcache layout
* rotary triton kernel - new kcache layout
* remove redundancy - triton decoding
* remove redundancy - triton kvcache copy
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-03 17:20:45 +08:00
wangbluo
2632916329
remove useless code
2024-05-01 09:23:43 +00:00
傅剑寒
9df016fc45
[Inference] Fix quant bits order ( #5681 )
2024-04-30 19:38:00 +08:00
yuehuayingxueluo
f79963199c
[inference]Add alibi to flash attn function ( #5678 )
...
* add alibi to flash attn function
* rm redundant modifications
2024-04-30 19:35:05 +08:00
傅剑寒
ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy ( #5680 )
2024-04-30 18:33:53 +08:00
wangbluo
9efc79ef24
add parallel output for mistral model
2024-04-30 08:10:20 +00:00
Steve Luo
5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… ( #5663 )
...
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator
* refactor decode_kv_cache_memcpy
* enable alibi in pagedattention
2024-04-30 15:52:23 +08:00
yuehuayingxueluo
5f00002e43
[Inference] Adapt Baichuan2-13B TP ( #5659 )
...
* adapt to baichuan2 13B
* add baichuan2 13B TP
* update baichuan tp logic
* rm unused code
* Fix TP logic
* fix alibi slopes tp logic
* rm nn.Module
* Polished the code.
* change BAICHUAN_MODEL_NAME_OR_PATH
* Modified the logic for loading Baichuan weights.
* fix typos
2024-04-30 15:47:07 +08:00
傅剑寒
808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 ( #5674 )
2024-04-30 11:26:36 +08:00
Wang Binluo
d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size ( #5670 )
...
* add assert for num of attention heads divisible by tp_size
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 18:47:47 +08:00
flybird11111
6af6d6fc9f
[shardformer] support bias_gelu_jit_fused for models ( #5647 )
...
* support gelu_bias_fused for gpt2
* support gelu_bias_fused for gpt2
fix
fix
fix
* fix
fix
* fix
2024-04-29 15:33:51 +08:00
Hongxin Liu
7f8b16635b
[misc] refactor launch API and tensor constructor ( #5666 )
...
* [misc] remove config arg from initialize
* [misc] remove old tensor contrusctor
* [plugin] add npu support for ddp
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [devops] fix doc test ci
* [test] fix test launch
* [doc] update launch doc
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
linsj20
91fa553775
[Feature] qlora support ( #5586 )
...
* [feature] qlora support
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* qlora follow commit
* migrate qutization folder to colossalai/
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor fixes
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-28 10:51:27 +08:00
flybird11111
8954a0c2e2
[LowLevelZero] low level zero support lora ( #5153 )
...
* low level zero support lora
low level zero support lora
* add checkpoint test
* add checkpoint test
* fix
* fix
* fix
* fix
fix
fix
fix
* fix
* fix
fix
fix
fix
fix
fix
fix
* fix
* fix
fix
fix
fix
fix
fix
fix
* fix
* test ci
* git # This is a combination of 3 commits.
Update low_level_zero_plugin.py
Update low_level_zero_plugin.py
fix
fix
fix
* fix naming
fix naming
fix naming
fix
2024-04-28 10:51:27 +08:00
Baizhou Zhang
14b0d4c7e5
[lora] add lora APIs for booster, support lora for TorchDDP ( #4981 )
...
* add apis and peft requirement
* add liscense and implement apis
* add checkpointio apis
* add torchddp fwd_bwd test
* add support_lora methods
* add checkpointio test and debug
* delete unneeded codes
* remove peft from LICENSE
* add concrete methods for enable_lora
* simplify enable_lora api
* fix requirements
2024-04-28 10:51:27 +08:00
Hongxin Liu
c1594e4bad
[devops] fix release docker ci ( #5665 )
2024-04-27 19:11:57 +08:00
Hongxin Liu
4cfbf30a5e
[release] update version ( #5654 )
2024-04-27 18:59:47 +08:00
Tong Li
68ec99e946
[hotfix] add soft link to support required files ( #5661 )
2024-04-26 21:12:04 +08:00
傅剑寒
8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding ( #5656 )
2024-04-26 19:40:37 +08:00
Yuanheng Zhao
5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel ( #5658 )
...
* add context attn triton kernel - new kcache layout
* add benchmark triton
* tiny revise
* trivial - code style, comment
2024-04-26 17:51:49 +08:00
binmakeswell
b8a711aa2d
[news] llama3 and open-sora v1.1 ( #5655 )
...
* [news] llama3 and open-sora v1.1
* [news] llama3 and open-sora v1.1
2024-04-26 15:36:37 +08:00
Hongxin Liu
2082852f3f
[lazyinit] skip whisper test ( #5653 )
2024-04-26 14:03:12 +08:00
flybird11111
8b7d535977
fix gptj ( #5652 )
2024-04-26 11:52:27 +08:00
yuehuayingxueluo
3c91e3f176
[Inference]Adapt to baichuan2 13B ( #5614 )
...
* adapt to baichuan2 13B
* adapt to baichuan2 13B
* change BAICHUAN_MODEL_NAME_OR_PATH
* fix test_decoding_attn.py
* Modifications based on review comments.
* change BAICHUAN_MODEL_NAME_OR_PATH
* mv attn mask processes to test flash decoding
* mv get_alibi_slopes baichuan modeling
* fix bugs in test_baichuan.py
2024-04-25 23:11:30 +08:00
Yuanheng Zhao
f342a93871
[Fix] Remove obsolete files - inference ( #5650 )
2024-04-25 22:04:59 +08:00
Hongxin Liu
1b387ca9fe
[shardformer] refactor pipeline grad ckpt config ( #5646 )
...
* [shardformer] refactor pipeline grad ckpt config
* [shardformer] refactor pipeline grad ckpt config
* [pipeline] fix stage manager
2024-04-25 15:19:30 +08:00
Season
7ef91606e1
[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios ( #5625 )
...
* implement thread-safety singleton
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* refactor singleton implementation
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 14:45:52 +08:00
Hongxin Liu
bbb2c21f16
[shardformer] fix chatglm implementation ( #5644 )
...
* [shardformer] fix chatglm policy
* [shardformer] fix chatglm flash attn
* [shardformer] update readme
* [shardformer] fix chatglm init
* [shardformer] fix chatglm test
* [pipeline] fix chatglm merge batch
2024-04-25 14:41:17 +08:00
Steve Luo
a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout ( #5643 )
...
* optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x])
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 14:24:02 +08:00
flybird11111
5d88ef1aaf
[shardformer] remove useless code ( #5645 )
2024-04-25 13:46:39 +08:00
flybird11111
148506c828
[coloattention]modify coloattention ( #5627 )
...
* modify coloattention
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* fix
* fix
fxi
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 10:47:14 +08:00