Commit Graph

468 Commits (19d1510ea26d10484a804eb62f6d03dbcc7c80a8)

Author SHA1 Message Date
hxwang 3e2b6132b7 [moe] clean legacy code
4 months ago
Runyu Lu bcf0181ecd
[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895)
4 months ago
Edenzzzz 8cc8f645cd
[Examples] Add lazy init to OPT and GPT examples (#5924)
4 months ago
Hongxin Liu e86127925a
[plugin] support all-gather overlap for hybrid parallel (#5919)
4 months ago
Hongxin Liu c068ef0fa0
[zero] support all-gather overlap (#5898)
5 months ago
Runyu Lu 66abf1c6e8
[HotFix] CI,import,requirements-test for #5838 (#5892)
5 months ago
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838)
5 months ago
Edenzzzz 8ec24b6a4d
[Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap
5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
5 months ago
botbw 8e718a1421
[gemini] fixes for benchmarking (#5847)
5 months ago
Edenzzzz 2a25a2aff7
[Feature] optimize PP overlap (#5735)
5 months ago
binmakeswell 4ccaaaab63
[doc] add GPU cloud playground (#5851)
5 months ago
Yuanheng Zhao 7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837)
5 months ago
Edenzzzz 8795bb2e80
Support 4d parallel + flash attention (#5789)
5 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
Yuanheng Zhao 677cbfacf8
[Fix/Example] Fix Llama Inference Loading Data Type (#5763)
6 months ago
hxwang 154720ba6e [chore] refactor profiler utils
6 months ago
genghaozhe 87665d7922 correct argument help message
6 months ago
genghaozhe b9269d962d add args.prefetch_num for benchmark
6 months ago
genghaozhe fba04e857b [bugs] fix args.profile=False DummyProfiler errro
6 months ago
hxwang ca674549e0 [chore] remove unnecessary test & changes
6 months ago
hxwang ff507b755e Merge branch 'main' of github.com:hpcaitech/ColossalAI into prefetch
6 months ago
hxwang 63c057cd8e [example] add profile util for llama
6 months ago
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713)
6 months ago
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742)
6 months ago
hxwang 15d21a077a Merge remote-tracking branch 'origin/main' into prefetch
6 months ago
Yuanheng Zhao 8633c15da9 [sync] Sync feature/colossal-infer with main
6 months ago
genghaozhe a280517dd9 remove unrelated file
6 months ago
genghaozhe df63db7e63 remote comments
6 months ago
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725)
7 months ago
hxwang 2e68eebdfe [chore] refactor & sync
7 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
7 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
CjhHa1 5d9a49483d [Inference] Add example test_ci script
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Yuanheng Zhao 12e7c28d5e
[hotfix] fix OpenMOE example import path (#5697)
7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
7 months ago
Edenzzzz c25f83c85f
fix missing pad token (#5690)
7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main)
7 months ago
Yuanheng Zhao 56ed09aba5 [sync] resolve conflicts of merging main
7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
7 months ago
Tong Li 68ec99e946
[hotfix] add soft link to support required files (#5661)
7 months ago
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658)
7 months ago
Yuanheng Zhao f342a93871
[Fix] Remove obsolete files - inference (#5650)
7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646)
7 months ago