add SGLang

pull/814/head
AllentDan 2025-01-15 10:43:58 +08:00
parent 245fc50235
commit 7fcb7f4145
2 changed files with 46 additions and 2 deletions

View File

@ -86,6 +86,28 @@ for output in outputs:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
### [SGLang](https://github.com/sgl-project/sglang)
`SGLang` is a fast serving framework for large language models and vision language models.
After the installation following the official [documentation](https://docs.sglang.ai/start/install.html), you can conduct the `internlm3-8b-instruct` model inference as follows:
```shell
python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
```
```shell
curl http://127.0.0.1:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "internlm/internlm3-8b-instruct",
"messages": [{"role": "user", "content": "Introduce Shanghai"}],
"stream": false
}' \
--no-buffer
```
### [TGI](https://github.com/huggingface/text-generation-inference)
TGI is a toolkit for deploying and serving Large Language Models (LLMs). The easiest way of deploying a LLM is using the official Docker container:
@ -246,7 +268,7 @@ It chooses ollama as the LLM inference engine locally. An example can be found f
Therefore, you can integrate InternLM2 or InternLM2.5 models to LlamaIndex smoothly if you can deploying them with `ollama` as guided in the [ollama section](#ollama)
### \[open-webui\]
### [open-webui](https://github.com/open-webui/open-webui)
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to run completely offline. It supports Ollama services and other compatible OpenAI API services, and comes with a built-in RAG reasoning engine, making it a powerful AI deployment solution.

View File

@ -86,6 +86,28 @@ for output in outputs:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
### [SGLang](https://github.com/sgl-project/sglang)
`SGLang` 是一个用于 LLMs 和 VLMs 的高效服务工具。
根据官方 [文档](https://docs.sglang.ai/start/install.html)安装完成后, 可以使用 `internlm3-8b-instruct` 模型进行如下的服务与调用:
```shell
python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
```
```shell
curl http://127.0.0.1:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "internlm/internlm3-8b-instruct",
"messages": [{"role": "user", "content": "Introduce Shanghai"}],
"stream": false
}' \
--no-buffer
```
### [TGI](https://github.com/huggingface/text-generation-inference)
TGI 是一个用于部署和提供 LLMs 服务的工具包。部署 LLM 服务最简单的方法是使用官方的 Docker 容器:
@ -246,7 +268,7 @@ LlamaIndex 是一个用于构建上下文增强型 LLM 应用程序的框架。
因此,如果能够按照 [ollama 章节](#ollama)使用 ollama 部署浦语模型,你就可以顺利地将浦语模型集成到 LlamaIndex 中。
### \[open-webui\]
### [open-webui](https://github.com/open-webui/open-webui)
Open WebUI 是一个可扩展、功能丰富且用户友好的自托管人工智能平台,旨在完全离线运行。它支持 Ollama 服务和其他兼容 OpenAI 的 API 服务,并内置 RAG 推理引擎,使其成为强大的 AI 部署解决方案。