mirror of https://github.com/InternLM/InternLM
[Doc]: update deployment guide (#591)
parent
2ae6225891
commit
c40b34798c
12
README.md
12
README.md
|
@ -130,11 +130,13 @@ The effect is similar to below:
|
|||
|
||||
We use [LMDeploy](https://github.com/InternLM/LMDeploy) for fast deployment of InternLM.
|
||||
|
||||
```shell
|
||||
# install LMDeploy
|
||||
python3 -m pip install lmdeploy
|
||||
# chat with internlm2
|
||||
lmdeploy chat turbomind InternLM/internlm2-chat-7b --model-name internlm2-chat-7b
|
||||
With only 4 lines of codes, you can perform `internlm2-chat-7b` inference after `pip install lmdeploy`.
|
||||
|
||||
```python
|
||||
from lmdeploy import pipeline
|
||||
pipe = pipeline("internlm/internlm2-chat-7b")
|
||||
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
|
||||
print(response)
|
||||
```
|
||||
|
||||
Please refer to the [guidance](./chat/lmdeploy.md) for more usages about model deployment. For additional deployment tutorials, feel free to explore [here](https://github.com/InternLM/LMDeploy).
|
||||
|
|
|
@ -121,9 +121,13 @@ streamlit run ./chat/web_demo.py
|
|||
|
||||
我们使用 [LMDeploy](https://github.com/InternLM/LMDeploy) 完成 InternLM 的一键部署。
|
||||
|
||||
```shell
|
||||
python3 -m pip install lmdeploy
|
||||
lmdeploy chat turbomind InternLM/internlm-chat-7b --model-name internlm-chat-7b
|
||||
通过 `pip install lmdeploy` 安装 LMDeploy 之后,只需 4 行代码,就可以实现离线批处理:
|
||||
|
||||
```python
|
||||
from lmdeploy import pipeline
|
||||
pipe = pipeline("internlm/internlm2-chat-7b")
|
||||
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
|
||||
print(response)
|
||||
```
|
||||
|
||||
请参考[部署指南](./chat/lmdeploy.md)了解更多使用案例,更多部署教程则可在[这里](https://github.com/InternLM/LMDeploy)找到。
|
||||
|
|
|
@ -0,0 +1,60 @@
|
|||
# Inference by LMDeploy
|
||||
|
||||
English | [简体中文](lmdeploy_zh_cn.md)
|
||||
|
||||
[LMDeploy](https://github.com/InternLM/lmdeploy) is an efficient, user-friendly toolkit designed for compressing, deploying, and serving LLM models.
|
||||
|
||||
This article primarily highlights the basic usage of LMDeploy. For a comprehensive understanding of the toolkit, we invite you to refer to [the tutorials](https://lmdeploy.readthedocs.io/en/latest/).
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
Install lmdeploy with pip (python 3.8+)
|
||||
|
||||
```shell
|
||||
pip install lmdeploy
|
||||
```
|
||||
|
||||
## Offline batch inference
|
||||
|
||||
With just 4 lines of codes, you can execute batch inference using a list of prompts:
|
||||
|
||||
```python
|
||||
from lmdeploy import pipeline
|
||||
pipe = pipeline("internlm/internlm2-chat-7b")
|
||||
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
|
||||
print(response)
|
||||
```
|
||||
|
||||
With dynamic ntk, LMDeploy can handle a context length of 200K for `InternLM2`:
|
||||
|
||||
```python
|
||||
from lmdeploy import pipeline, TurbomindEngineConfig
|
||||
engine_config = TurbomindEngineConfig(session_len=200000,
|
||||
rope_scaling_factor=2.0)
|
||||
pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config)
|
||||
gen_config = GenerationConfig(top_p=0.8,
|
||||
top_k=40,
|
||||
temperature=0.8,
|
||||
max_new_tokens=1024)
|
||||
response = pipe(prompt, gen_config=gen_config)
|
||||
print(response)
|
||||
```
|
||||
|
||||
For more information about LMDeploy pipeline usage, please refer to [here](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html).
|
||||
|
||||
## Serving
|
||||
|
||||
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
||||
|
||||
```shell
|
||||
lmdeploy serve api_server internlm/internlm2-chat-7b
|
||||
```
|
||||
|
||||
The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`:
|
||||
|
||||
```shell
|
||||
lmdeploy serve api_client http://0.0.0.0:23333
|
||||
```
|
||||
|
||||
Alternatively, you can test the server's APIs oneline through the Swagger UI at `http://0.0.0.0:23333`. A detailed overview of the API specification is available [here](https://lmdeploy.readthedocs.io/en/latest/serving/restful_api.html).
|
|
@ -0,0 +1,59 @@
|
|||
# LMDeploy 推理
|
||||
|
||||
[English](lmdeploy.md) | 简体中文
|
||||
|
||||
[LMDeploy](https://github.com/InternLM/lmdeploy) 是一个高效且友好的 LLM 模型部署工具箱,功能涵盖了量化、推理和服务。
|
||||
|
||||
本文主要介绍 LMDeploy 的基本用法,包括[安装](#安装)、[离线批处理](#离线批处理)和[推理服务](#推理服务)。更全面的介绍请参考 [LMDeploy 用户指南](https://lmdeploy.readthedocs.io/zh-cn/latest/)。
|
||||
|
||||
|
||||
## 安装
|
||||
|
||||
使用 pip(python 3.8+)安装 LMDeploy
|
||||
|
||||
```shell
|
||||
pip install lmdeploy
|
||||
```
|
||||
|
||||
## 离线批处理
|
||||
|
||||
只用以下 4 行代码,就可以完成 prompts 的批处理:
|
||||
|
||||
```python
|
||||
from lmdeploy import pipeline
|
||||
pipe = pipeline("internlm/internlm2-chat-7b")
|
||||
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
|
||||
print(response)
|
||||
```
|
||||
|
||||
LMDeploy 实现了 dynamic ntk,支持长文本外推。使用如下代码,可以把 InternLM2 的文本外推到 200K:
|
||||
```python
|
||||
from lmdeploy import pipeline, TurbomindEngineConfig
|
||||
engine_config = TurbomindEngineConfig(session_len=200000,
|
||||
rope_scaling_factor=2.0)
|
||||
pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config)
|
||||
gen_config = GenerationConfig(top_p=0.8,
|
||||
top_k=40,
|
||||
temperature=0.8,
|
||||
max_new_tokens=1024)
|
||||
response = pipe(prompt, gen_config=gen_config)
|
||||
print(response)
|
||||
```
|
||||
|
||||
更多关于 pipeline 的使用方式,请参考[这里](https://lmdeploy.readthedocs.io/zh-cn/latest/inference/pipeline.html)
|
||||
|
||||
## 推理服务
|
||||
|
||||
LMDeploy `api_server` 支持把模型一键封装为服务,对外提供的 RESTful API 兼容 openai 的接口。以下为服务启动的示例:
|
||||
|
||||
```shell
|
||||
lmdeploy serve api_server internlm/internlm2-chat-7b
|
||||
```
|
||||
|
||||
服务默认端口是23333。在 server 启动后,你可以在终端通过`api_client`与server进行对话,体验对话效果:
|
||||
|
||||
```shell
|
||||
lmdeploy serve api_client http://0.0.0.0:23333
|
||||
```
|
||||
|
||||
此外,你还可以通过 Swagger UI `http://0.0.0.0:23333` 在线阅读和试用 `api_server` 的各接口,也可直接查阅[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/serving/restful_api.html),了解各接口的定义和使用方法。
|
Loading…
Reference in New Issue