Update ecosystem documentation (#814)

pull/824/head
AllentDan 2025-01-16 11:45:29 +08:00 committed by GitHub
parent 051011405f
commit 4fc3a32c7e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 68 additions and 4 deletions

View File

@ -86,6 +86,28 @@ for output in outputs:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
### [SGLang](https://github.com/sgl-project/sglang)
`SGLang` is a fast serving framework for large language models and vision language models.
After the installation following the official [documentation](https://docs.sglang.ai/start/install.html), you can conduct the `internlm3-8b-instruct` model inference as follows:
```shell
python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
```
```shell
curl http://127.0.0.1:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "internlm/internlm3-8b-instruct",
"messages": [{"role": "user", "content": "Introduce Shanghai"}],
"stream": false
}' \
--no-buffer
```
### [TGI](https://github.com/huggingface/text-generation-inference)
TGI is a toolkit for deploying and serving Large Language Models (LLMs). The easiest way of deploying a LLM is using the official Docker container:
@ -221,8 +243,9 @@ from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(
api_key="a dummy key",
base_ur='https://0.0.0.0:23333/v1')
model_name="a-model",
openai_api_key="a dummy key",
openai_api_base='https://0.0.0.0:23333/v1')
prompt = ChatPromptTemplate.from_messages([
("system", "You are a world class technical documentation writer."),
("user", "{input}")
@ -245,6 +268,13 @@ It chooses ollama as the LLM inference engine locally. An example can be found f
Therefore, you can integrate InternLM2 or InternLM2.5 models to LlamaIndex smoothly if you can deploying them with `ollama` as guided in the [ollama section](#ollama)
### [open-webui](https://github.com/open-webui/open-webui)
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to run completely offline. It supports Ollama services and other compatible OpenAI API services, and comes with a built-in RAG reasoning engine, making it a powerful AI deployment solution.
1. You can start an API service with LMDeploy, or launch the service with ollama.
2. Follow the [guidance](https://github.com/open-webui/open-webui?tab=readme-ov-file#installation-via-python-pip-)to install Open WebUI, and start the webui service with open-webui serve. Open the webui in your browser.
3. Refer to the [documentation](https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama#step-2-managing-your-ollama-instance). Inside the opened page, find the settings, configure the OpenAI-like services or ollama services. Once configured, you can choose a model to engage in conversation.
### [LazyLLM](https://github.com/LazyAGI/LazyLLM)
@ -282,6 +312,7 @@ from lazyllm import pipeline, parallel, bind, SentenceSplitter, Document, Retrie
prompt = 'You will play the role of an AI Q&A assistant and complete a dialogue task. In this task, you need to provide your answer based on the given context and question.'
```
</details>
```python

View File

@ -86,6 +86,28 @@ for output in outputs:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
### [SGLang](https://github.com/sgl-project/sglang)
`SGLang` 是一个用于 LLMs 和 VLMs 的高效服务工具。
根据官方 [文档](https://docs.sglang.ai/start/install.html)安装完成后, 可以使用 `internlm3-8b-instruct` 模型进行如下的服务与调用:
```shell
python3 -m sglang.launch_server --model internlm/internlm3-8b-instruct --trust-remote-code --chat-template internlm2-chat
```
```shell
curl http://127.0.0.1:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "internlm/internlm3-8b-instruct",
"messages": [{"role": "user", "content": "Introduce Shanghai"}],
"stream": false
}' \
--no-buffer
```
### [TGI](https://github.com/huggingface/text-generation-inference)
TGI 是一个用于部署和提供 LLMs 服务的工具包。部署 LLM 服务最简单的方法是使用官方的 Docker 容器:
@ -221,8 +243,9 @@ from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(
api_key="a dummy key",
base_ur='https://0.0.0.0:23333/v1')
model_name="a-model",
openai_api_key="a dummy key",
openai_api_base='https://0.0.0.0:23333/v1')
prompt = ChatPromptTemplate.from_messages([
("system", "You are a world class technical documentation writer."),
("user", "{input}")
@ -245,6 +268,14 @@ LlamaIndex 是一个用于构建上下文增强型 LLM 应用程序的框架。
因此,如果能够按照 [ollama 章节](#ollama)使用 ollama 部署浦语模型,你就可以顺利地将浦语模型集成到 LlamaIndex 中。
### [open-webui](https://github.com/open-webui/open-webui)
Open WebUI 是一个可扩展、功能丰富且用户友好的自托管人工智能平台,旨在完全离线运行。它支持 Ollama 服务和其他兼容 OpenAI 的 API 服务,并内置 RAG 推理引擎,使其成为强大的 AI 部署解决方案。
1. 可以用 LMDeploy 启动一个 api_server 服务,或者用 ollama 启动服务。
2. 按照 [引导](https://github.com/open-webui/open-webui?tab=readme-ov-file#installation-via-python-pip-) 安装 open-webui`open-webui serve` 启动 webui 服务。浏览器打开 webui。
3. 参考 [文档](https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama#step-2-managing-your-ollama-instance)。在打开的页面内部找到设置,配置好类 OpenAI 服务或者 ollama 服务,配置完就可以选用某个模型进行对话了。
### [LazyLLM](https://github.com/LazyAGI/LazyLLM)
LazyLLM 是一个的低代码构建多 Agent 大模型应用的开发工具,相比于 LangChain 和 LLamaIndex其具有极高的灵活性和易用性。
@ -266,6 +297,7 @@ from lazyllm import TrainableModule, WebModule
m = TrainableModule('internlm2-chat-7b').trainset('/patt/to/your_data.json').mode('finetune')
WebModule(m).update().wait()
```
值的一提的是,无论您用 InternLM 系列的任何一个模型,都可以使用 LazyLLM 进行推理和微调,您都无需考虑模型的切分策略,也无需考虑模型的特殊 token。<br>
如果您想搭建自己的 RAG 应用,那么您无需像使用 LangChain 一样先启动服务推理服务,再配置 ip 和端口去启动应用程序。参考如下代码,您可以借助 LazyLLM使用 InternLM 系列的模型,十行代码搭建高度定制的 RAG 应用,且附带文档管理服务(文档需指定本地绝对路径,可从这里下载:[rag_master](https://huggingface.co/datasets/Jing0o0Xin/rag_master)
@ -279,6 +311,7 @@ from lazyllm import pipeline, parallel, bind, SentenceSplitter, Document, Retrie
prompt = '你将扮演一个人工智能问答助手的角色,完成一项对话任务。在这个任务中,你需要根据给定的上下文以及问题,给出你的回答。'
```
</details>
```python