Commit Graph

3307 Commits (58ad76d4665032bbe548d066116d1c572ce98979)
 

Author SHA1 Message Date
hxwang 58ad76d466 [refactor] remove legacy async reduce scatter code
6 months ago
hxwang fee35678e5 [fix] fix missing reduce variable
6 months ago
hxwang b5ae587d50 [gemini] optimize reduce scatter d2h copy
6 months ago
Yuanheng Zhao b96c6390f4
[inference] Fix running time of test_continuous_batching (#5750)
6 months ago
Edenzzzz 5f8c0a0ac3
[Feature] auto-cast optimizers to distributed version (#5746)
6 months ago
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713)
6 months ago
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742)
6 months ago
binmakeswell 4647ec28c8
[inference] release (#5747)
6 months ago
Yuanheng Zhao df6747603f
[Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer
6 months ago
Yuanheng Zhao 498f42c45b
[NFC] fix requirements (#5744)
6 months ago
Yuanheng Zhao bd38fe6b91
[NFC] Fix code factors on inference triton kernels (#5743)
6 months ago
Yuanheng Zhao c2c8c9cf17
[ci] Temporary fix for build on pr (#5741)
6 months ago
Yuanheng Zhao c06208e72c
Merge pull request #5737 from yuanheng-zhao/inference/sync/main
6 months ago
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702)
6 months ago
Yuanheng Zhao 8633c15da9 [sync] Sync feature/colossal-infer with main
6 months ago
Yuanheng Zhao d8b1ea4ac9
[doc] Update Inference Readme (#5736)
6 months ago
Yuanheng Zhao bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message (#5730)
6 months ago
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710)
6 months ago
flybird11111 9d83c6d715
[lazy] fix lazy cls init (#5720)
7 months ago
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725)
7 months ago
binmakeswell 2011b1356a
[misc] Update PyTorch version in docs (#5724)
7 months ago
傅剑寒 a8d459f99a
【Inference] Delete duplicated package (#5723)
7 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
7 months ago
Tong Li 913c920ecc
[Colossal-LLaMA] Fix sft issue for llama2 (#5719)
7 months ago
Runyu Lu 74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717)
7 months ago
Yuanheng Zhao 5bbab1533a
[ci] Fix example tests (#5714)
7 months ago
傅剑寒 121d7ad629
[Inference] Delete duplicated copy_vector (#5716)
7 months ago
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
7 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
7 months ago
hugo-syn 393c8f5b7f
[hotfix] fix inference typo (#5438)
7 months ago
Edenzzzz 785cd9a9c9
[misc] Update PyTorch version in docs (#5711)
7 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
7 months ago
傅剑寒 50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706)
7 months ago
Wang Binluo 537f6a3855
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704)
7 months ago
Wang Binluo a3cc68ca93
[Shardformer] Support the Qwen2 model (#5699)
7 months ago
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700)
7 months ago
Jianghai 492520dbdb
Merge pull request #5588 from hpcaitech/feat/online-serving
7 months ago
CjhHa1 5d9a49483d [Inference] Add example test_ci script
7 months ago
flybird11111 d4c5ef441e
[gemini]remove registered gradients hooks (#5696)
7 months ago
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
CjhHa1 7bbb28e48b [Inference] resolve rebase conflicts
7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
7 months ago
Yuanheng Zhao 12e7c28d5e
[hotfix] fix OpenMOE example import path (#5697)
7 months ago
Wang Binluo 22297789ab
Merge pull request #5684 from wangbluo/parallel_output
7 months ago