From 40f8e2ed19464744f238ba35989d7a99dec4d3f9 Mon Sep 17 00:00:00 2001 From: ZwwWayne Date: Wed, 17 Jan 2024 09:43:47 +0800 Subject: [PATCH] clean doc --- chat/lmdeploy.md | 55 ------------------------------------------ chat/lmdeploy_zh-CN.md | 55 ------------------------------------------ 2 files changed, 110 deletions(-) delete mode 100644 chat/lmdeploy.md delete mode 100644 chat/lmdeploy_zh-CN.md diff --git a/chat/lmdeploy.md b/chat/lmdeploy.md deleted file mode 100644 index 41439d2..0000000 --- a/chat/lmdeploy.md +++ /dev/null @@ -1,55 +0,0 @@ -# Inference by LMDeploy - -English | [简体中文](lmdeploy_zh_zh-CN.md) - -[LMDeploy](https://github.com/InternLM/lmdeploy) is an efficient, user-friendly toolkit designed for compressing, deploying, and serving LLM models. - -This article primarily highlights the basic usage of LMDeploy. For a comprehensive understanding of the toolkit, we invite you to refer to [the tutorials](https://lmdeploy.readthedocs.io/en/latest/). - -## Installation - -Install lmdeploy with pip (python 3.8+) - -```shell -pip install lmdeploy -``` - -## Offline batch inference - -With just 4 lines of codes, you can execute batch inference using a list of prompts: - -```python -from lmdeploy import pipeline -pipe = pipeline("internlm/internlm2-chat-7b") -response = pipe(["Hi, pls intro yourself", "Shanghai is"]) -print(response) -``` - -With dynamic ntk, LMDeploy can handle a context length of 200K for `InternLM2`: - -```python -from lmdeploy import pipeline, TurbomindEngineConfig -engine_config = TurbomindEngineConfig(session_len=200000, - rope_scaling_factor=2.0), -pipe = pipeline("internlm/internlm2-chat-7b", engine_config) -prompt = 'Please offer a long prompt here' -print(response) -``` - -For more information about LMDeploy pipeline usage, please refer to [here](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html). - -## Serving - -LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: - -```shell -lmdeploy serve api_server internlm/internlm2-chat-7b -``` - -The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`: - -```shell -lmdeploy serve api_client http://0.0.0.0:23333 -``` - -Alternatively, you can test the server's APIs oneline through the Swagger UI at `http://0.0.0.0:23333`. A detailed overview of the API specification is available [here](https://lmdeploy.readthedocs.io/en/latest/serving/restful_api.html). diff --git a/chat/lmdeploy_zh-CN.md b/chat/lmdeploy_zh-CN.md deleted file mode 100644 index 835aedd..0000000 --- a/chat/lmdeploy_zh-CN.md +++ /dev/null @@ -1,55 +0,0 @@ -# LMDeploy 推理 - -[English](lmdeploy.md) | 简体中文 - -[LMDeploy](https://github.com/InternLM/lmdeploy) 是一个高效且友好的 LLM 模型部署工具箱,功能涵盖了量化、推理和服务。 - -本文主要介绍 LMDeploy 的基本用法,包括[安装](#安装)、[离线批处理](#离线批处理)和[推理服务](#推理服务)。更全面的介绍请参考 [LMDeploy 用户指南](https://lmdeploy.readthedocs.io/zh-cn/latest/)。 - - -## 安装 - -使用 pip(python 3.8+)安装 LMDeploy - -```shell -pip install lmdeploy -``` - -## 离线批处理 - -只用以下 4 行代码,就可以完成 prompts 的批处理: - -```python -from lmdeploy import pipeline -pipe = pipeline("internlm/internlm2-chat-7b") -response = pipe(["Hi, pls intro yourself", "Shanghai is"]) -print(response) -``` - -LMDeploy 实现了 dynamic ntk,支持长文本外推。使用如下代码,可以把 InternLM2 的文本外推到 200K: -```python -from lmdeploy import pipeline, TurbomindEngineConfig -engine_config = TurbomindEngineConfig(session_len=200000, - rope_scaling_factor=2.0), -pipe = pipeline("internlm/internlm2-chat-7b", engine_config) -prompt = 'Please offer a long prompt here' -print(response) -``` - -更多关于 pipeline 的使用方式,请参考[这里](https://lmdeploy.readthedocs.io/zh-cn/latest/inference/pipeline.html) - -## 推理服务 - -LMDeploy `api_server` 支持把模型一键封装为服务,对外提供的 RESTful API 兼容 openai 的接口。以下为服务启动的示例: - -```shell -lmdeploy serve api_server internlm/internlm2-chat-7b -``` - -服务默认端口是23333。在 server 启动后,你可以在终端通过`api_client`与server进行对话,体验对话效果: - -```shell -lmdeploy serve api_client http://0.0.0.0:23333 -``` - -此外,你还可以通过 Swagger UI `http://0.0.0.0:23333` 在线阅读和试用 `api_server` 的各接口,也可直接查阅[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/serving/restful_api.html),了解各接口的定义和使用方法。