docs: add SGLang (#821)

pull/822/head
Yineng Zhang 2025-01-15 17:25:04 +08:00 committed by GitHub
parent f9a1f26b9f
commit 1759c4b9b4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 42 additions and 0 deletions

View File

@ -254,6 +254,39 @@ curl http://localhost:23333/v1/chat/completions \
Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)
#### SGLang inference
##### Installation
```bash
pip3 install "sglang[srt]>=0.4.1.post6" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```
##### OpenAI Compatible Server
```bash
python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
```
##### OpenAI client
```python3
import openai
client = openai.Client(
base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
# Chat completion
response = client.chat.completions.create(
model="default",
messages=[
{"role": "system", "content": "You are a helpful AI assistant"},
{"role": "user", "content": "List 3 countries and their capitals."},
],
temperature=0,
max_tokens=64,
)
print(response)
```
#### Ollama inference
TODO
@ -401,6 +434,15 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
print(response)
```
#### SGLang inference
Installation
```bash
pip3 install "sglang[srt]>=0.4.1.post6" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```
For offline engine api usage, please refer to [Offline Engine API](https://docs.sglang.ai/backend/offline_engine_api.html)
#### Ollama inference
TODO