diff --git a/ecosystem/README.md b/ecosystem/README.md index d784e91..3b435d9 100644 --- a/ecosystem/README.md +++ b/ecosystem/README.md @@ -86,6 +86,28 @@ for output in outputs: print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` +### [SGLang](https://github.com/sgl-project/sglang) + +`SGLang` is a fast serving framework for large language models and vision language models. + +After the installation following the official [documentation](https://docs.sglang.ai/start/install.html), you can conduct the `internlm3-8b-instruct` model inference as follows: + +```shell +python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat +``` + +```shell +curl http://127.0.0.1:30000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer EMPTY" \ + -d '{ + "model": "internlm/internlm3-8b-instruct", + "messages": [{"role": "user", "content": "Introduce Shanghai"}], + "stream": false + }' \ + --no-buffer +``` + ### [TGI](https://github.com/huggingface/text-generation-inference) TGI is a toolkit for deploying and serving Large Language Models (LLMs). The easiest way of deploying a LLM is using the official Docker container: @@ -246,7 +268,7 @@ It chooses ollama as the LLM inference engine locally. An example can be found f Therefore, you can integrate InternLM2 or InternLM2.5 models to LlamaIndex smoothly if you can deploying them with `ollama` as guided in the [ollama section](#ollama) -### \[open-webui\] +### [open-webui](https://github.com/open-webui/open-webui) Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to run completely offline. It supports Ollama services and other compatible OpenAI API services, and comes with a built-in RAG reasoning engine, making it a powerful AI deployment solution. diff --git a/ecosystem/README_zh-CN.md b/ecosystem/README_zh-CN.md index 0072905..e299c36 100644 --- a/ecosystem/README_zh-CN.md +++ b/ecosystem/README_zh-CN.md @@ -86,6 +86,28 @@ for output in outputs: print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` +### [SGLang](https://github.com/sgl-project/sglang) + +`SGLang` 是一个用于 LLMs 和 VLMs 的高效服务工具。 + +根据官方 [文档](https://docs.sglang.ai/start/install.html)安装完成后, 可以使用 `internlm3-8b-instruct` 模型进行如下的服务与调用: + +```shell +python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat +``` + +```shell +curl http://127.0.0.1:30000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer EMPTY" \ + -d '{ + "model": "internlm/internlm3-8b-instruct", + "messages": [{"role": "user", "content": "Introduce Shanghai"}], + "stream": false + }' \ + --no-buffer +``` + ### [TGI](https://github.com/huggingface/text-generation-inference) TGI 是一个用于部署和提供 LLMs 服务的工具包。部署 LLM 服务最简单的方法是使用官方的 Docker 容器: @@ -246,7 +268,7 @@ LlamaIndex 是一个用于构建上下文增强型 LLM 应用程序的框架。 因此,如果能够按照 [ollama 章节](#ollama)使用 ollama 部署浦语模型,你就可以顺利地将浦语模型集成到 LlamaIndex 中。 -### \[open-webui\] +### [open-webui](https://github.com/open-webui/open-webui) Open WebUI 是一个可扩展、功能丰富且用户友好的自托管人工智能平台,旨在完全离线运行。它支持 Ollama 服务和其他兼容 OpenAI 的 API 服务,并内置 RAG 推理引擎,使其成为强大的 AI 部署解决方案。