Commit Graph

453 Commits (ceba662d22b1f982bf1d56e9699034b4ac97b60e)

Author SHA1 Message Date
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
* Add Streaming LLM

* add some parameters to llama_generation.py

* verify streamingllm config

* add test_streamingllm.py

* modified according to the opinions of review

* add Citation

* change _block_tables tolist
2024-06-05 10:51:19 +08:00
Yuanheng Zhao 677cbfacf8
[Fix/Example] Fix Llama Inference Loading Data Type (#5763)
* [fix/example] fix llama inference loading dtype

* revise loading dtype of benchmark llama3
2024-05-30 13:48:46 +08:00
hxwang 154720ba6e [chore] refactor profiler utils 2024-05-28 12:41:42 +00:00
genghaozhe 87665d7922 correct argument help message 2024-05-27 06:03:53 +00:00
genghaozhe b9269d962d add args.prefetch_num for benchmark 2024-05-25 14:55:50 +00:00
genghaozhe fba04e857b [bugs] fix args.profile=False DummyProfiler errro 2024-05-25 14:55:09 +00:00
hxwang ca674549e0 [chore] remove unnecessary test & changes 2024-05-24 06:09:36 +00:00
hxwang ff507b755e Merge branch 'main' of github.com:hpcaitech/ColossalAI into prefetch 2024-05-24 04:05:07 +00:00
hxwang 63c057cd8e [example] add profile util for llama 2024-05-24 03:59:36 +00:00
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713)
* [gemini] async grad chunk reduce (all-reduce&reduce-scatter)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [gemini] add test

* [gemini] rename func

* [gemini] update llama benchmark

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [gemini] use tensor counter

* [gemini] change default config in GeminiPlugin and GeminiDDP

* [chore] typo

* [gemini] fix sync issue & add test cases

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-24 10:31:16 +08:00
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742)
* fix chatapi readme and example

* updating doc

* add an api and change the doc

* remove

* add credits and del 'API' heading

* readme

* readme
2024-05-24 10:03:05 +08:00
hxwang 15d21a077a Merge remote-tracking branch 'origin/main' into prefetch 2024-05-23 15:49:33 +00:00
Yuanheng Zhao 8633c15da9 [sync] Sync feature/colossal-infer with main 2024-05-20 15:50:53 +00:00
genghaozhe a280517dd9 remove unrelated file 2024-05-20 05:25:35 +00:00
genghaozhe df63db7e63 remote comments 2024-05-20 05:15:51 +00:00
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725)
* [example] update inference example
2024-05-17 11:28:53 +08:00
hxwang 2e68eebdfe [chore] refactor & sync 2024-05-16 07:22:10 +00:00
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
* fix api server

* fix generation config

* fix api server

* fix comments

* fix infer hanging bug

* resolve comments, change backend to free port
2024-05-15 15:47:31 +08:00
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
CjhHa1 5d9a49483d [Inference] Add example test_ci script 2024-05-09 05:44:05 +00:00
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
* fix test bugs

* add do sample test

* del useless lines

* fix comments

* fix tests

* delete version tag

* delete version tag

* add

* del test sever

* fix test

* fix

* Revert "add"

This reverts commit b9305fb024.
2024-05-08 15:20:53 +00:00
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
* fix bugs

* fix bugs

* fix api server

* fix api server

* add chat api and test

* del request.n
2024-05-08 15:20:53 +00:00
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
* finish online test and add examples

* fix test_contionus_batching

* fix some bugs

* fix bash

* fix

* fix inference

* finish revision

* fix typos

* revision
2024-05-08 15:20:52 +00:00
Yuanheng Zhao 12e7c28d5e
[hotfix] fix OpenMOE example import path (#5697) 2024-05-08 15:48:47 +08:00
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
* clean requirements

* modify example inference struct

* add test ci scripts

* mark test_infer as submodule

* rm deprecated cls & deps

* import of HAS_FLASH_ATTN

* prune inference tests to be run

* prune triton kernel tests

* increment pytest timeout mins

* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Edenzzzz c25f83c85f
fix missing pad token (#5690)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-05-06 18:17:26 +08:00
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 2024-05-05 16:28:56 +00:00
Yuanheng Zhao 56ed09aba5 [sync] resolve conflicts of merging main 2024-05-05 05:14:00 +00:00
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
* kvmemcpy triton for new kcache layout

* revise tests for new kcache layout

* naive triton flash decoding - new kcache layout

* rotary triton kernel - new kcache layout

* remove redundancy - triton decoding

* remove redundancy - triton kvcache copy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-03 17:20:45 +08:00
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator

* refactor decode_kv_cache_memcpy

* enable alibi in pagedattention
2024-04-30 15:52:23 +08:00
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
Tong Li 68ec99e946
[hotfix] add soft link to support required files (#5661) 2024-04-26 21:12:04 +08:00
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658)
* add context attn triton kernel - new kcache layout

* add benchmark triton

* tiny revise

* trivial - code style, comment
2024-04-26 17:51:49 +08:00
Yuanheng Zhao f342a93871
[Fix] Remove obsolete files - inference (#5650) 2024-04-25 22:04:59 +08:00
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646)
* [shardformer] refactor pipeline grad ckpt config

* [shardformer] refactor pipeline grad ckpt config

* [pipeline] fix stage manager
2024-04-25 15:19:30 +08:00
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
* optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x])

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-25 14:24:02 +08:00
yuehuayingxueluo 90cd5227a3
[Fix/Inference]Fix vllm benchmark (#5630)
* Fix bugs about OOM when running vllm-0.4.0

* rm used params

* change generation_config

* change benchmark log file name
2024-04-24 14:51:36 +08:00
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
* refactor compilation mechanism and unified multi hw

* fix file path bug

* add init.py to make pybind a module to avoid relative path error caused by softlink

* delete duplicated micros

* fix micros bug in gcc
2024-04-24 14:17:54 +08:00
Yuanheng Zhao 04863a9b14
[example] Update Llama Inference example (#5629)
* [example] add infernece benchmark llama3

* revise inference config - arg

* remove unused args

* add llama generation demo script

* fix init rope in llama policy

* add benchmark-llama3 - cleanup
2024-04-23 22:23:07 +08:00
binmakeswell f4c5aafe29
[example] llama3 (#5631)
* release llama3

* [release] llama3

* [release] llama3

* [release] llama3

* [release] llama3
2024-04-23 18:48:07 +08:00
Hongxin Liu 4de4e31818
[exampe] update llama example (#5626)
* [plugin] support dp inside for hybriad parallel

* [example] update llama benchmark

* [example] update llama benchmark

* [example] update llama readme

* [example] update llama readme
2024-04-23 14:12:20 +08:00
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611) 2024-04-19 15:34:53 +08:00
Edenzzzz d83c633ca6
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606)
* fix no pad token bug

* fixed some auto parallel codegen bug, but might not run on torch 2.1

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-04-18 18:15:50 +08:00
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
* feat flash decoding for paged attention

* refactor flashdecodingattention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-18 16:45:07 +08:00
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
* Adapted to the baichuan2-7B model

* modified according to the review comments.

* Modified the method of obtaining random weights.

* modified according to the review comments.

* change mlp layewr 'NOTE'
2024-04-15 16:53:02 +08:00
Yuanheng ed5ebd1735 [Fix] resolve conflicts of merging main 2024-04-08 16:21:47 +08:00
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
* [devops] remove post commit ci

* [misc] run pre-commit on all files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-08 15:09:40 +08:00
digger yu 341263df48
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) 2024-04-07 19:04:58 +08:00
digger yu a799ca343b
[fix] fix typo s/muiti-node /multi-node etc. (#5448) 2024-04-07 18:42:15 +08:00
Edenzzzz 15055f9a36
[hotfix] quick fixes to make legacy tutorials runnable (#5559)
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-04-07 12:06:27 +08:00