docs: add SGLang (#821)

2025-01-15 17:25:04 +08:00 · 2025-01-15 17:25:04 +08:00 · 1759c4b9b4
parent f9a1f26b9f
commit 1759c4b9b4
1 changed files with 42 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -254,6 +254,39 @@ curl http://localhost:23333/v1/chat/completions \

 Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)

+#### SGLang inference
+
+##### Installation
+```bash
+pip3 install "sglang[srt]>=0.4.1.post6" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
+```
+
+##### OpenAI Compatible Server
+
+```bash
+python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
+```
+
+##### OpenAI client
+
+```python3
+import openai
+client = openai.Client(
+    base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
+
+# Chat completion
+response = client.chat.completions.create(
+    model="default",
+    messages=[
+        {"role": "system", "content": "You are a helpful AI assistant"},
+        {"role": "user", "content": "List 3 countries and their capitals."},
+    ],
+    temperature=0,
+    max_tokens=64,
+)
+print(response)
+```
+
 #### Ollama inference

 TODO
@ -401,6 +434,15 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
 print(response)
 ```

+#### SGLang inference
+
+Installation
+```bash
+pip3 install "sglang[srt]>=0.4.1.post6" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
+```
+
+For offline engine api usage, please refer to [Offline Engine API](https://docs.sglang.ai/backend/offline_engine_api.html)
+
 #### Ollama inference

 TODO