diff --git a/README.md b/README.md index b2663ef..0ae6276 100644 --- a/README.md +++ b/README.md @@ -133,11 +133,13 @@ The effect is similar to below: We use [LMDeploy](https://github.com/InternLM/LMDeploy) for fast deployment of InternLM. -```shell -# install LMDeploy -python3 -m pip install lmdeploy -# chat with internlm2 -lmdeploy chat turbomind InternLM/internlm2-chat-7b --model-name internlm2-chat-7b +With only 4 lines of codes, you can perform `internlm2-chat-7b` inference after `pip install lmdeploy`. + +```python +from lmdeploy import pipeline +pipe = pipeline("internlm/internlm2-chat-7b") +response = pipe(["Hi, pls intro yourself", "Shanghai is"]) +print(response) ``` Please refer to the [guidance](./chat/lmdeploy.md) for more usages about model deployment. For additional deployment tutorials, feel free to explore [here](https://github.com/InternLM/LMDeploy). diff --git a/README_zh-CN.md b/README_zh-CN.md index a403616..cb6f151 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -131,9 +131,13 @@ streamlit run ./chat/web_demo.py 我们使用 [LMDeploy](https://github.com/InternLM/LMDeploy) 完成 InternLM 的一键部署。 -```shell -python3 -m pip install lmdeploy -lmdeploy chat turbomind InternLM/internlm2-chat-7b --model-name internlm2-chat-7b +通过 `pip install lmdeploy` 安装 LMDeploy 之后,只需 4 行代码,就可以实现离线批处理: + +```python +from lmdeploy import pipeline +pipe = pipeline("internlm/internlm2-chat-7b") +response = pipe(["Hi, pls intro yourself", "Shanghai is"]) +print(response) ``` 请参考[部署指南](./chat/lmdeploy.md)了解更多使用案例,更多部署教程则可在[这里](https://github.com/InternLM/LMDeploy)找到。 diff --git a/chat/lmdeploy.md b/chat/lmdeploy.md new file mode 100644 index 0000000..36c7a16 --- /dev/null +++ b/chat/lmdeploy.md @@ -0,0 +1,60 @@ +# Inference by LMDeploy + +English | [简体中文](lmdeploy_zh_cn.md) + +[LMDeploy](https://github.com/InternLM/lmdeploy) is an efficient, user-friendly toolkit designed for compressing, deploying, and serving LLM models. + +This article primarily highlights the basic usage of LMDeploy. For a comprehensive understanding of the toolkit, we invite you to refer to [the tutorials](https://lmdeploy.readthedocs.io/en/latest/). + + +## Installation + +Install lmdeploy with pip (python 3.8+) + +```shell +pip install lmdeploy +``` + +## Offline batch inference + +With just 4 lines of codes, you can execute batch inference using a list of prompts: + +```python +from lmdeploy import pipeline +pipe = pipeline("internlm/internlm2-chat-7b") +response = pipe(["Hi, pls intro yourself", "Shanghai is"]) +print(response) +``` + +With dynamic ntk, LMDeploy can handle a context length of 200K for `InternLM2`: + +```python +from lmdeploy import pipeline, TurbomindEngineConfig +engine_config = TurbomindEngineConfig(session_len=200000, + rope_scaling_factor=2.0) +pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config) +gen_config = GenerationConfig(top_p=0.8, + top_k=40, + temperature=0.8, + max_new_tokens=1024) +response = pipe(prompt, gen_config=gen_config) +print(response) +``` + +For more information about LMDeploy pipeline usage, please refer to [here](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html). + +## Serving + +LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: + +```shell +lmdeploy serve api_server internlm/internlm2-chat-7b +``` + +The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`: + +```shell +lmdeploy serve api_client http://0.0.0.0:23333 +``` + +Alternatively, you can test the server's APIs oneline through the Swagger UI at `http://0.0.0.0:23333`. A detailed overview of the API specification is available [here](https://lmdeploy.readthedocs.io/en/latest/serving/restful_api.html). diff --git a/chat/lmdeploy_zh_cn.md b/chat/lmdeploy_zh_cn.md new file mode 100644 index 0000000..1df7e54 --- /dev/null +++ b/chat/lmdeploy_zh_cn.md @@ -0,0 +1,59 @@ +# LMDeploy 推理 + +[English](lmdeploy.md) | 简体中文 + +[LMDeploy](https://github.com/InternLM/lmdeploy) 是一个高效且友好的 LLM 模型部署工具箱,功能涵盖了量化、推理和服务。 + +本文主要介绍 LMDeploy 的基本用法,包括[安装](#安装)、[离线批处理](#离线批处理)和[推理服务](#推理服务)。更全面的介绍请参考 [LMDeploy 用户指南](https://lmdeploy.readthedocs.io/zh-cn/latest/)。 + + +## 安装 + +使用 pip(python 3.8+)安装 LMDeploy + +```shell +pip install lmdeploy +``` + +## 离线批处理 + +只用以下 4 行代码,就可以完成 prompts 的批处理: + +```python +from lmdeploy import pipeline +pipe = pipeline("internlm/internlm2-chat-7b") +response = pipe(["Hi, pls intro yourself", "Shanghai is"]) +print(response) +``` + +LMDeploy 实现了 dynamic ntk,支持长文本外推。使用如下代码,可以把 InternLM2 的文本外推到 200K: +```python +from lmdeploy import pipeline, TurbomindEngineConfig +engine_config = TurbomindEngineConfig(session_len=200000, + rope_scaling_factor=2.0) +pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config) +gen_config = GenerationConfig(top_p=0.8, + top_k=40, + temperature=0.8, + max_new_tokens=1024) +response = pipe(prompt, gen_config=gen_config) +print(response) +``` + +更多关于 pipeline 的使用方式,请参考[这里](https://lmdeploy.readthedocs.io/zh-cn/latest/inference/pipeline.html) + +## 推理服务 + +LMDeploy `api_server` 支持把模型一键封装为服务,对外提供的 RESTful API 兼容 openai 的接口。以下为服务启动的示例: + +```shell +lmdeploy serve api_server internlm/internlm2-chat-7b +``` + +服务默认端口是23333。在 server 启动后,你可以在终端通过`api_client`与server进行对话,体验对话效果: + +```shell +lmdeploy serve api_client http://0.0.0.0:23333 +``` + +此外,你还可以通过 Swagger UI `http://0.0.0.0:23333` 在线阅读和试用 `api_server` 的各接口,也可直接查阅[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/serving/restful_api.html),了解各接口的定义和使用方法。 diff --git a/chat/openaoe.md b/chat/openaoe.md new file mode 100644 index 0000000..6038b44 --- /dev/null +++ b/chat/openaoe.md @@ -0,0 +1,71 @@ +# Multi-Chats by OpenAOE + +English | [简体中文](openaoe_zh_cn.md) +## Introduction +[OpenAOE](https://github.com/InternLM/OpenAOE) is a LLM-Group-Chat Framework, which can chat with multiple LLMs (commercial/open source LLMs) at the same time. OpenAOE provides both backend API and WEB-UI to meet different usage needs. + +Currently already supported LLMs: [InternLM2-Chat-7B](https://huggingface.co/internlm/internlm2-chat-7b), [IntenLM-Chat-7B](https://huggingface.co/internlm/internlm-chat-7b), GPT-3.5, GPT-4, Google PaLM, MiniMax, Claude, Spark, etc. + +## Quick Run +> [!TIP] +> Require python >= 3.9 + +We provide three different ways to run OpenAOE: `run by pip`, `run by docker` and `run by source code` as well. + +### Run by pip +#### **Install** +```shell +pip install -U openaoe +``` +#### **Start** +```shell +openaoe -f /path/to/your/config-template.yaml +``` + +### Run by docker +#### **Install** + +There are two ways to get the OpenAOE docker image by: +1. pull the OpenAOE docker image +```shell +docker pull openaoe:latest +``` + +2. or build a docker image +```shell +git clone https://github.com/internlm/OpenAOE +cd open-aoe +docker build . -f docker/Dockerfile -t openaoe:latest +``` + +#### **Start** +```shell +docker run -p 10099:10099 -v /path/to/your/config-template.yaml:/app/config-template.yaml --name OpenAOE openaoe:latest +``` + +### Run by source code +#### **Install** +1. clone this project +```shell +git clone https://github.com/internlm/OpenAOE +``` +2. [_optional_] build the frontend project when the frontend codes are changed +```shell +cd open-aoe/openaoe/frontend +npm install +npm run build +``` + + +#### **Start** +```shell +cd open-aoe/openaoe +pip install -r backend/requirements.txt +python -m main -f /path/to/your/config-template.yaml +``` + +> [!TIP] +> `/path/to/your/config.yaml` is the configuration file loaded by OpenAOE at startup, +> which contains the relevant configuration information for the LLMs, +> including: API URLs, AKSKs, Tokens, etc. +> A template configuration yaml file can be found in `openaoe/backend/config/config.yaml`. diff --git a/chat/openaoe_zh_cn.md b/chat/openaoe_zh_cn.md new file mode 100644 index 0000000..7a9fc83 --- /dev/null +++ b/chat/openaoe_zh_cn.md @@ -0,0 +1,70 @@ +# OpenAOE 多模型对话 + +[English](openaoe.md) | 简体中文 + + +## 介绍 +[OpenAOE](https://github.com/InternLM/OpenAOE) 是一个 LLM-Group-Chat 框架,可以同时与多个商业大模型或开源大模型进行聊天。 OpenAOE还提供后端API和WEB-UI以满足不同的使用需求。 + +目前已经支持的大模型有: [InternLM2-Chat-7B](https://huggingface.co/internlm/internlm2-chat-7b), [IntenLM-Chat-7B](https://huggingface.co/internlm/internlm-chat-7b), GPT-3.5, GPT-4, Google PaLM, MiniMax, Claude, 讯飞星火等。 + + +## 快速安装 +我们将提供 3 种不同的方式安装:基于 pip、基于 docker 以及基于源代码,实现开箱即用。 + +### 基于 pip +> [!TIP] +> 需要 python >= 3.9 +#### **安装** +```shell +pip install -U openaoe +``` +#### **运行** +```shell +openaoe -f /path/to/your/config-template.yaml +``` + +### 基于 docker +#### **安装** +有两种方式获取 OpenAOE 的 docker 镜像: +1. 官方拉取 +```shell +docker pull openaoe:latest +``` + +2. 本地构建 +```shell +git clone https://github.com/internlm/OpenAOE +cd open-aoe +docker build . -f docker/Dockerfile -t openaoe:latest +``` + +#### **运行** +```shell +docker run -p 10099:10099 -v /path/to/your/config-template.yaml:/app/config-template.yaml --name OpenAOE openaoe:latest +``` + +### 基于源代码 +#### **安装** +1. 克隆项目 +```shell +git clone https://github.com/internlm/OpenAOE +``` +2. [_可选_] (如果前端代码发生变动)重新构建前端项目 +```shell +cd open-aoe/openaoe/frontend +npm install +npm run build +``` + + +#### **运行** +```shell +cd open-aoe/openaoe +pip install -r backend/requirements.txt +python -m main -f /path/to/your/config-template.yaml +`````` + +> [!TIP] +> `/path/to/your/config.yaml` 是 OpenAOE 启动时读取的配置文件,里面包含了大模型的相关配置信息, +> 包括:调用API地址、AKSK、Token等信息,是 OpenAOE 启动的必备文件。模板文件可以在 `openaoe/backend/config/config.yaml` 中找到。