add SGLang

2025-01-15 10:43:58 +08:00 · 2025-01-15 10:43:58 +08:00 · 7fcb7f4145
parent 245fc50235
commit 7fcb7f4145
2 changed files with 46 additions and 2 deletions
--- a/ecosystem/README.md
+++ b/ecosystem/README.md
@ -86,6 +86,28 @@ for output in outputs:
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
 ### [SGLang](https://github.com/sgl-project/sglang)
 `SGLang` is a fast serving framework for large language models and vision language models.
 After the installation following the official [documentation](https://docs.sglang.ai/start/install.html), you can conduct the `internlm3-8b-instruct` model inference as follows:
 ```shell
 python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
 ```
 ```shell
 curl http://127.0.0.1:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "internlm/internlm3-8b-instruct",
    "messages": [{"role": "user", "content": "Introduce Shanghai"}],
    "stream": false
  }' \
  --no-buffer
 ```
 ### [TGI](https://github.com/huggingface/text-generation-inference)
 TGI is a toolkit for deploying and serving Large Language Models (LLMs). The easiest way of deploying a LLM is using the official Docker container:
@ -246,7 +268,7 @@ It chooses ollama as the LLM inference engine locally. An example can be found f
 Therefore, you can integrate InternLM2 or InternLM2.5 models to LlamaIndex smoothly if you can deploying them with `ollama` as guided in the [ollama section](#ollama)
-### \[open-webui\]
+### [open-webui](https://github.com/open-webui/open-webui)
 Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to run completely offline. It supports Ollama services and other compatible OpenAI API services, and comes with a built-in RAG reasoning engine, making it a powerful AI deployment solution.
--- a/ecosystem/README_zh-CN.md
+++ b/ecosystem/README_zh-CN.md
@ -86,6 +86,28 @@ for output in outputs:
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
 ### [SGLang](https://github.com/sgl-project/sglang)
 `SGLang` 是一个用于 LLMs 和 VLMs 的高效服务工具。
 根据官方 [文档](https://docs.sglang.ai/start/install.html)安装完成后, 可以使用 `internlm3-8b-instruct` 模型进行如下的服务与调用：
 ```shell
 python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
 ```
 ```shell
 curl http://127.0.0.1:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "internlm/internlm3-8b-instruct",
    "messages": [{"role": "user", "content": "Introduce Shanghai"}],
    "stream": false
  }' \
  --no-buffer
 ```
 ### [TGI](https://github.com/huggingface/text-generation-inference)
 TGI 是一个用于部署和提供 LLMs 服务的工具包。部署 LLM 服务最简单的方法是使用官方的 Docker 容器：
@ -246,7 +268,7 @@ LlamaIndex 是一个用于构建上下文增强型 LLM 应用程序的框架。
 因此，如果能够按照 [ollama 章节](#ollama)使用 ollama 部署浦语模型，你就可以顺利地将浦语模型集成到 LlamaIndex 中。
-### \[open-webui\]
+### [open-webui](https://github.com/open-webui/open-webui)
 Open WebUI 是一个可扩展、功能丰富且用户友好的自托管人工智能平台，旨在完全离线运行。它支持 Ollama 服务和其他兼容 OpenAI 的 API 服务，并内置 RAG 推理引擎，使其成为强大的 AI 部署解决方案。