1943 Commits (5f8c0a0ac3b52a71b664c3e36dd1a8cef40f428d)

Author SHA1 Message Date
Edenzzzz 5f8c0a0ac3
[Feature] auto-cast optimizers to distributed version (#5746) 6 months ago
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) 6 months ago
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742) 6 months ago
binmakeswell 4647ec28c8
[inference] release (#5747) 6 months ago
Yuanheng Zhao bd38fe6b91
[NFC] Fix code factors on inference triton kernels (#5743) 6 months ago
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) 6 months ago
Yuanheng Zhao d8b1ea4ac9
[doc] Update Inference Readme (#5736) 6 months ago
Yuanheng Zhao bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message (#5730) 6 months ago
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710) 6 months ago
flybird11111 9d83c6d715
[lazy] fix lazy cls init (#5720) 6 months ago
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725) 6 months ago
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712) 6 months ago
Runyu Lu 74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) 6 months ago
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 6 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705) 6 months ago
hugo-syn 393c8f5b7f
[hotfix] fix inference typo (#5438) 6 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) 7 months ago
Wang Binluo 537f6a3855
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) 7 months ago
Wang Binluo a3cc68ca93
[Shardformer] Support the Qwen2 model (#5699) 7 months ago
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700) 7 months ago
flybird11111 d4c5ef441e
[gemini]remove registered gradients hooks (#5696) 7 months ago
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving 7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) 7 months ago
CjhHa1 7bbb28e48b [Inference] resolve rebase conflicts 7 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470) 7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) 7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396) 7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693) 7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689) 7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688) 7 months ago
Yuanheng Zhao f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) 7 months ago
wangbluo 4e50cce26b fix the mistral model 7 months ago
wangbluo a8408b4d31 remove comment code 7 months ago
pre-commit-ci[bot] ca56b93d83 [pre-commit.ci] auto fixes from pre-commit.com hooks 7 months ago
wangbluo 108ddfb795 add parallel_output for the opt model 7 months ago
pre-commit-ci[bot] 88f057ce7c [pre-commit.ci] auto fixes from pre-commit.com hooks 7 months ago
flybird11111 77ec773388
[zero]remove registered gradients hooks (#5687) 7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677) 7 months ago
wangbluo 2632916329 remove useless code 7 months ago
yuehuayingxueluo f79963199c
[inference]Add alibi to flash attn function (#5678) 7 months ago
wangbluo 9efc79ef24 add parallel output for mistral model 7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) 7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659) 7 months ago
Wang Binluo d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) 7 months ago
flybird11111 6af6d6fc9f
[shardformer] support bias_gelu_jit_fused for models (#5647) 7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
linsj20 91fa553775 [Feature] qlora support (#5586) 7 months ago
flybird11111 8954a0c2e2 [LowLevelZero] low level zero support lora (#5153) 7 months ago