docs: add SGLang (#821)

2025-01-15 17:25:04 +08:00 · 2025-01-15 17:25:04 +08:00 · 1759c4b9b4
parent f9a1f26b9f
commit 1759c4b9b4
1 changed files with 42 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -254,6 +254,39 @@ curl http://localhost:23333/v1/chat/completions \
 Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)
 #### SGLang inference
 ##### Installation
 ```bash
 pip3 install "sglang[srt]>=0.4.1.post6" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
 ```
 ##### OpenAI Compatible Server
 ```bash
 python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
 ```
 ##### OpenAI client
 ```python3
 import openai
 client = openai.Client(
    base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
 # Chat completion
 response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."},
    ],
    temperature=0,
    max_tokens=64,
 )
 print(response)
 ```
 #### Ollama inference
 TODO
@ -401,6 +434,15 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
 print(response)
 ```
 #### SGLang inference
 Installation
 ```bash
 pip3 install "sglang[srt]>=0.4.1.post6" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
 ```
 For offline engine api usage, please refer to [Offline Engine API](https://docs.sglang.ai/backend/offline_engine_api.html)
 #### Ollama inference
 TODO