470 Commits (main)

Author SHA1 Message Date
duanjunwen e0c68ab6d3
[Zerobubble] merge main. (#6142) 2 days ago
botbw c54c4fcd15
[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) 2 months ago
Wenxuan Tan 8fd25d6e09
[Feature] Split cross-entropy computation in SP (#5959) 2 months ago
Wang Binluo eea37da6fa
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) 3 months ago
Edenzzzz f5c84af0b0
[Feature] Zigzag Ring attention (#5905) 3 months ago
flybird11111 0a51319113
[fp8] zero support fp8 linear. (#6006) 3 months ago
Hongxin Liu 8241c0c054
[fp8] support gemini plugin (#5978) 3 months ago
Hanks b480eec738
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) 4 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
hxwang 3e2b6132b7 [moe] clean legacy code 4 months ago
Runyu Lu bcf0181ecd
[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) 4 months ago
Edenzzzz 8cc8f645cd
[Examples] Add lazy init to OPT and GPT examples (#5924) 4 months ago
Hongxin Liu e86127925a
[plugin] support all-gather overlap for hybrid parallel (#5919) 4 months ago
GuangyaoZhang 6a20f07b80 remove all to all 4 months ago
GuangyaoZhang 5a310b9ee1 fix rebase 4 months ago
GuangyaoZhang 457a0de79f shardformer fp8 4 months ago
BurkeHulk 66018749f3 add fp8_communication flag in the script 4 months ago
Hongxin Liu c068ef0fa0
[zero] support all-gather overlap (#5898) 4 months ago
Runyu Lu 66abf1c6e8
[HotFix] CI,import,requirements-test for #5838 (#5892) 5 months ago
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) 5 months ago
Edenzzzz 8ec24b6a4d
[Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap 5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572) 5 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821) 5 months ago
botbw 8e718a1421
[gemini] fixes for benchmarking (#5847) 5 months ago
Edenzzzz 2a25a2aff7
[Feature] optimize PP overlap (#5735) 5 months ago
binmakeswell 4ccaaaab63
[doc] add GPU cloud playground (#5851) 5 months ago
Yuanheng Zhao 7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) 5 months ago
Edenzzzz 8795bb2e80
Support 4d parallel + flash attention (#5789) 5 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745) 6 months ago
Yuanheng Zhao 677cbfacf8
[Fix/Example] Fix Llama Inference Loading Data Type (#5763) 6 months ago
hxwang 154720ba6e [chore] refactor profiler utils 6 months ago
genghaozhe 87665d7922 correct argument help message 6 months ago
Haze188 4d097def96
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) 6 months ago
genghaozhe b9269d962d add args.prefetch_num for benchmark 6 months ago
genghaozhe fba04e857b [bugs] fix args.profile=False DummyProfiler errro 6 months ago
hxwang ca674549e0 [chore] remove unnecessary test & changes 6 months ago
hxwang 63c057cd8e [example] add profile util for llama 6 months ago
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) 6 months ago
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742) 6 months ago
genghaozhe a280517dd9 remove unrelated file 6 months ago
genghaozhe 1ec92d29af remove perf log, unrelated file and so on 6 months ago
genghaozhe 5c6c5d6be3 remove comments 6 months ago
genghaozhe df63db7e63 remote comments 6 months ago
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725) 6 months ago
hxwang 2e68eebdfe [chore] refactor & sync 6 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712) 6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 6 months ago
CjhHa1 5d9a49483d [Inference] Add example test_ci script 7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) 7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470) 7 months ago