update doc

2024-01-17 10:33:37 +08:00 · 2024-01-17 10:33:37 +08:00 · e1b511e1f4
parent 2fad14f1f9 c40b34798c
commit e1b511e1f4
6 changed files with 274 additions and 8 deletions
--- a/README.md
+++ b/README.md
@ -133,11 +133,13 @@ The effect is similar to below:
 We use [LMDeploy](https://github.com/InternLM/LMDeploy) for fast deployment of InternLM.
-```shell
+With only 4 lines of codes, you can perform `internlm2-chat-7b` inference after `pip install lmdeploy`.
-# install LMDeploy
+
-python3 -m pip install lmdeploy
+```python
-# chat with internlm2
+from lmdeploy import pipeline
-lmdeploy chat turbomind InternLM/internlm2-chat-7b --model-name internlm2-chat-7b
+pipe = pipeline("internlm/internlm2-chat-7b")
 response = pipe(["Hi, pls intro yourself", "Shanghai is"])
 print(response)
 ```
 Please refer to the [guidance](./chat/lmdeploy.md) for more usages about model deployment. For additional deployment tutorials, feel free to explore [here](https://github.com/InternLM/LMDeploy).
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -131,9 +131,13 @@ streamlit run ./chat/web_demo.py
 我们使用 [LMDeploy](https://github.com/InternLM/LMDeploy) 完成 InternLM 的一键部署。
-```shell
+通过 `pip install lmdeploy` 安装 LMDeploy 之后，只需 4 行代码，就可以实现离线批处理：
-python3 -m pip install lmdeploy
+
-lmdeploy chat turbomind InternLM/internlm2-chat-7b --model-name internlm2-chat-7b
+```python
 from lmdeploy import pipeline
 pipe = pipeline("internlm/internlm2-chat-7b")
 response = pipe(["Hi, pls intro yourself", "Shanghai is"])
 print(response)
 ```
 请参考[部署指南](./chat/lmdeploy.md)了解更多使用案例，更多部署教程则可在[这里](https://github.com/InternLM/LMDeploy)找到。
--- a/chat/lmdeploy.md
+++ b/chat/lmdeploy.md
@ -0,0 +1,60 @@
 # Inference by LMDeploy
 English | [简体中文](lmdeploy_zh_cn.md)
 [LMDeploy](https://github.com/InternLM/lmdeploy) is an efficient, user-friendly toolkit designed for compressing, deploying, and serving LLM models.
 This article primarily highlights the basic usage of LMDeploy. For a comprehensive understanding of the toolkit, we invite you to refer to [the tutorials](https://lmdeploy.readthedocs.io/en/latest/).
 ## Installation
 Install lmdeploy with pip (python 3.8+)
 ```shell
 pip install lmdeploy
 ```
 ## Offline batch inference
 With just 4 lines of codes, you can execute batch inference using a list of prompts:
 ```python
 from lmdeploy import pipeline
 pipe = pipeline("internlm/internlm2-chat-7b")
 response = pipe(["Hi, pls intro yourself", "Shanghai is"])
 print(response)
 ```
 With dynamic ntk, LMDeploy can handle a context length of 200K for `InternLM2`:
 ```python
 from lmdeploy import pipeline, TurbomindEngineConfig
 engine_config = TurbomindEngineConfig(session_len=200000,
                                      rope_scaling_factor=2.0)
 pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config)
 gen_config = GenerationConfig(top_p=0.8,
                              top_k=40,
                              temperature=0.8,
                              max_new_tokens=1024)
 response = pipe(prompt, gen_config=gen_config)
 print(response)
 ```
 For more information about LMDeploy pipeline usage, please refer to [here](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html).
 ## Serving
 LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
 ```shell
 lmdeploy serve api_server internlm/internlm2-chat-7b
 ```
 The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`:
 ```shell
 lmdeploy serve api_client http://0.0.0.0:23333
 ```
 Alternatively, you can test the server's APIs oneline through the Swagger UI at `http://0.0.0.0:23333`. A detailed overview of the API specification is available [here](https://lmdeploy.readthedocs.io/en/latest/serving/restful_api.html).
--- a/chat/lmdeploy_zh_cn.md
+++ b/chat/lmdeploy_zh_cn.md
@ -0,0 +1,59 @@
 # LMDeploy 推理
 [English](lmdeploy.md) | 简体中文
 [LMDeploy](https://github.com/InternLM/lmdeploy) 是一个高效且友好的 LLM 模型部署工具箱，功能涵盖了量化、推理和服务。
 本文主要介绍 LMDeploy 的基本用法，包括[安装](#安装)、[离线批处理](#离线批处理)和[推理服务](#推理服务)。更全面的介绍请参考 [LMDeploy 用户指南](https://lmdeploy.readthedocs.io/zh-cn/latest/)。
 ## 安装
 使用 pip（python 3.8+）安装 LMDeploy
 ```shell
 pip install lmdeploy
 ```
 ## 离线批处理
 只用以下 4 行代码，就可以完成 prompts 的批处理：
 ```python
 from lmdeploy import pipeline
 pipe = pipeline("internlm/internlm2-chat-7b")
 response = pipe(["Hi, pls intro yourself", "Shanghai is"])
 print(response)
 ```
 LMDeploy 实现了 dynamic ntk，支持长文本外推。使用如下代码，可以把 InternLM2 的文本外推到 200K：
 ```python
 from lmdeploy import pipeline, TurbomindEngineConfig
 engine_config = TurbomindEngineConfig(session_len=200000,
                                      rope_scaling_factor=2.0)
 pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config)
 gen_config = GenerationConfig(top_p=0.8,
                              top_k=40,
                              temperature=0.8,
                              max_new_tokens=1024)
 response = pipe(prompt, gen_config=gen_config)
 print(response)
 ```
 更多关于 pipeline 的使用方式，请参考[这里](https://lmdeploy.readthedocs.io/zh-cn/latest/inference/pipeline.html)
 ## 推理服务
 LMDeploy `api_server` 支持把模型一键封装为服务，对外提供的 RESTful API 兼容 openai 的接口。以下为服务启动的示例：
 ```shell
 lmdeploy serve api_server internlm/internlm2-chat-7b
 ```
 服务默认端口是23333。在 server 启动后，你可以在终端通过`api_client`与server进行对话，体验对话效果：
 ```shell
 lmdeploy serve api_client http://0.0.0.0:23333
 ```
 此外，你还可以通过 Swagger UI `http://0.0.0.0:23333` 在线阅读和试用 `api_server` 的各接口，也可直接查阅[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/serving/restful_api.html)，了解各接口的定义和使用方法。
--- a/chat/openaoe.md
+++ b/chat/openaoe.md
@ -0,0 +1,71 @@
 # Multi-Chats by OpenAOE
 English | [简体中文](openaoe_zh_cn.md)
 ## Introduction
 [OpenAOE](https://github.com/InternLM/OpenAOE) is a LLM-Group-Chat Framework, which can chat with multiple LLMs (commercial/open source LLMs) at the same time. OpenAOE provides both backend API and WEB-UI to meet different usage needs.
 Currently already supported LLMs: [InternLM2-Chat-7B](https://huggingface.co/internlm/internlm2-chat-7b), [IntenLM-Chat-7B](https://huggingface.co/internlm/internlm-chat-7b), GPT-3.5, GPT-4, Google PaLM, MiniMax, Claude, Spark, etc.
 ## Quick Run
 > [!TIP]
 > Require python >= 3.9
 We provide three different ways to run OpenAOE: `run by pip`， `run by docker` and `run by source code` as well.
 ### Run by pip 
 #### **Install**
 ```shell
 pip install -U openaoe 
 ```
 #### **Start**
 ```shell
 openaoe -f /path/to/your/config-template.yaml
 ```
 ### Run by docker
 #### **Install**
 There are two ways to get the OpenAOE docker image by:
 1. pull the OpenAOE docker image
 ```shell
 docker pull openaoe:latest
 ```
 2. or build a docker image
 ```shell
 git clone https://github.com/internlm/OpenAOE
 cd open-aoe
 docker build . -f docker/Dockerfile -t openaoe:latest
 ```
 #### **Start**
 ```shell
 docker run -p 10099:10099 -v /path/to/your/config-template.yaml:/app/config-template.yaml --name OpenAOE openaoe:latest
 ```
 ### Run by source code
 #### **Install**
 1. clone this project
 ```shell
 git clone https://github.com/internlm/OpenAOE
 ```
 2. [_optional_] build the frontend project when the frontend codes are changed
 ```shell
 cd open-aoe/openaoe/frontend
 npm install
 npm run build
 ```
 #### **Start**
 ```shell
 cd open-aoe/openaoe
 pip install -r backend/requirements.txt
 python -m main -f /path/to/your/config-template.yaml
 ```
 > [!TIP]
 > `/path/to/your/config.yaml` is the configuration file loaded by OpenAOE at startup, 
 > which contains the relevant configuration information for the LLMs,
 > including: API URLs, AKSKs, Tokens, etc.
 > A template configuration yaml file can be found in `openaoe/backend/config/config.yaml`.
--- a/chat/openaoe_zh_cn.md
+++ b/chat/openaoe_zh_cn.md
@ -0,0 +1,70 @@
 # OpenAOE 多模型对话
 [English](openaoe.md) | 简体中文
 ## 介绍
 [OpenAOE](https://github.com/InternLM/OpenAOE) 是一个 LLM-Group-Chat 框架，可以同时与多个商业大模型或开源大模型进行聊天。 OpenAOE还提供后端API和WEB-UI以满足不同的使用需求。
 目前已经支持的大模型有：  [InternLM2-Chat-7B](https://huggingface.co/internlm/internlm2-chat-7b), [IntenLM-Chat-7B](https://huggingface.co/internlm/internlm-chat-7b), GPT-3.5, GPT-4, Google PaLM, MiniMax, Claude, 讯飞星火等。
 ## 快速安装
 我们将提供 3 种不同的方式安装：基于 pip、基于 docker 以及基于源代码，实现开箱即用。
 ### 基于 pip
 > [!TIP]
 > 需要 python >= 3.9
 #### **安装**
 ```shell
 pip install -U openaoe 
 ```
 #### **运行**
 ```shell
 openaoe -f /path/to/your/config-template.yaml
 ```
 ### 基于 docker
 #### **安装**
 有两种方式获取 OpenAOE 的 docker 镜像：
 1. 官方拉取
 ```shell
 docker pull openaoe:latest
 ```
 2. 本地构建
 ```shell
 git clone https://github.com/internlm/OpenAOE
 cd open-aoe
 docker build . -f docker/Dockerfile -t openaoe:latest
 ```
 #### **运行**
 ```shell
 docker run -p 10099:10099 -v /path/to/your/config-template.yaml:/app/config-template.yaml --name OpenAOE openaoe:latest
 ```
 ### 基于源代码
 #### **安装**
 1. 克隆项目
 ```shell
 git clone https://github.com/internlm/OpenAOE
 ```
 2. [_可选_] （如果前端代码发生变动）重新构建前端项目 
 ```shell
 cd open-aoe/openaoe/frontend
 npm install
 npm run build
 ```
 #### **运行**
 ```shell
 cd open-aoe/openaoe
 pip install -r backend/requirements.txt
 python -m main -f /path/to/your/config-template.yaml
 ``````
 > [!TIP]
 > `/path/to/your/config.yaml` 是 OpenAOE 启动时读取的配置文件，里面包含了大模型的相关配置信息，
 > 包括：调用API地址、AKSK、Token等信息，是 OpenAOE 启动的必备文件。模板文件可以在 `openaoe/backend/config/config.yaml` 中找到。