Add long context README and file chat demo

2024-07-03 11:30:42 +00:00 · 2024-07-03 11:30:42 +00:00 · 21cee906fb
parent 767931acf3
commit 21cee906fb
5 changed files with 309 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -40,7 +40,7 @@ InternLM2.5 series are released with the following features:

 - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.

- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](./chat/lmdeploy.md) for 1M-context inference.
+- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](./chat/lmdeploy.md) for 1M-context inference. More details and a file chat demo are found [here](./long_context/README.md).

 - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](./agent/).

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -39,7 +39,7 @@
 InternLM2.5 系列模型在本仓库正式发布，具有如下特性：

 - 卓越的推理性能：在数学推理方面取得了同量级模型最优精度，超越了 Llama3 和 Gemma2-9B。
- 有效支持百万字超长上下文：模型在 1 百万字长输入中几乎完美地实现长文“大海捞针”，而且在 LongBench 等长文任务中的表现也达到开源模型中的领先水平。 可以通过 [LMDeploy](./chat/lmdeploy_zh_cn.md) 尝试百万字超长上下文推理。
+- 有效支持百万字超长上下文：模型在 1 百万字长输入中几乎完美地实现长文“大海捞针”，而且在 LongBench 等长文任务中的表现也达到开源模型中的领先水平。 可以通过 [LMDeploy](./chat/lmdeploy_zh_cn.md) 尝试百万字超长上下文推理。更多内容和文档对话 demo 请查看[这里](./long_context/README_zh-CN.md)。
 - 工具调用能力整体升级：InternLM2.5 支持从上百个网页搜集有效信息进行分析推理，相关实现将于近期开源到 [Lagent](https://github.com/InternLM/lagent/tree/main)。InternLM2.5 具有更强和更具有泛化性的指令理解、工具筛选与结果反思等能力，新版模型可以更可靠地支持复杂智能体的搭建，支持对工具进行有效的多轮调用，完成较复杂的任务。可以查看更多[样例](./agent/)。

 ## 更新
--- a/long_context/README.md
+++ b/long_context/README.md
@ -0,0 +1,84 @@
+# InternLM with Long Context
+
+English | [简体中文](./README_zh-CN.md)
+
+## InternLM2.5 with 1M Context Length
+
+We introduce InternLM2.5-7B-Chat-1M, a model developed to support extensive long inputs of up to 1M tokens.
+This enhancement significantly enhances the model's ability to handle ultra-long text applications. See [model zoo](../README.md#model-zoo) for download or [model cards](../model_cards/) for more details.
+
+During pre-training, we utilized natural language corpora with text lengths of 256K tokens. To address the potential domain shift caused by homogeneous data, we supplemented with synthetic data to maintain the model's capabilities while expanding its context.
+
+We employed the "*needle in a haystack approach*" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.
+
+<p align="center">
+<img src="https://github.com/libowen2121/InternLM/assets/19970308/2ce3745f-26f5-4a39-bdcd-2075790d7b1d" alt="drawing" width="700"/>
+</p>
+
+We also used the [LongBench](https://github.com/THUDM/LongBench) benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.
+
+<p align="center">
+<img src="https://github.com/libowen2121/InternLM/assets/19970308/1e8f7da8-8193-4def-8b06-0550bab6a12f" alt="drawing" width="800"/>
+</p>
+
+## File Chat with InternLM2.5-1M
+
+This section provides a brief overview of how to chat with InternLM2.5-7B-Chat-1M using an input document. For optimal performance, especially with extensively long inputs, we highly recommend using [LMDeploy](https://github.com/InternLM/LMDeploy) for model serving.
+
+### Supported Document Types
+
+Currently, we support PDF, TXT, and Markdown files, with more file types to be supported soon!
+
+- TXT and Markdown files: These can be processed directly without any conversions.
+- PDF files: We have developed [Magic-Doc](https://github.com/magicpdf/Magic-Doc), a lightweight open-source tool, to convert multiple file types to Markdown.
+
+### Installation
+
+To get started, install the required packages:
+```bash
+pip install "fairy-doc[cpu]"
+pip install streamlit
+pip install lmdeploy
+```
+
+### Deploy the Model
+
+Download our model from [model zoo](../README.md#model-zoo).
+
+Deploy the model using the following command. You can specify the `session-len` (sequence length) and `server-port`.
+
+```bash
+lmdeploy serve api_server {path_to_hf_model} \
+--model-name internlm2-chat \
+--session-len 65536 \
+--server-port 8000
+```
+
+To further enlarge the sequence length, we suggest adding the following arguments:
+`--max-batch-size 1 --cache-max-entry-count 0.7 --tp {num_of_gpus}`
+
+### Launch the Streamlit Demo
+
+```bash
+streamlit run long_context/doc_chat_demo.py \
+-- --base_url http://0.0.0.0:8000/v1
+```
+
+You can specify the port as needed. If running the demo locally, the URL could be `http://0.0.0.0:{your_port}/v1` or `http://localhost:{your_port}/v1`. For virtual cloud machines, we recommend using VSCode for seamless port forwarding.
+
+For long inputs, we suggest the following parameters:
+
+- Temperature: 0.05
+- Repetition penalty: 1.02
+
+Of course, you can tweak these settings for optimal performance yourself in the web UI.
+
+The effect is demonstrated in the video below.
+
+https://github.com/libowen2121/InternLM/assets/19970308/1d7f9b87-d458-4f24-9f7a-437a4da3fa6e
+
+
+## 🔜 Stay Tuned for More
+
+We are continuously enhancing our models to better understand and reason with extensive long inputs. Expect new features, improved performance, and expanded capabilities in upcoming updates!
+
--- a/long_context/README_zh-CN.md
+++ b/long_context/README_zh-CN.md
@ -0,0 +1,79 @@
+# InternLM with Long Context
+
+## InternLM2.5-7B-Chat-1M
+
+很高兴向大家介绍 InternLM2.5-7B-Chat-1M，它拥有处理超长文本的能力，支持长达 1M tokens 的输入。
+
+在预训练阶段，我们使用了包含长度为 256K tokens 的语料训练。为了应对由于数据同质可能引起的领域偏移问题，我们在训练过程中引入了合成数据，不仅保持了模型的能力，还增强了其对上下文的理解程度。
+
+在“*大海捞针*”实验中，InternLM2.5-7B-Chat-1M 能够在长达 1M tokens 的文档中准确地定位关键信息。
+
+<p align="center">
+<img src="https://github.com/libowen2121/InternLM/assets/19970308/2ce3745f-26f5-4a39-bdcd-2075790d7b1d" alt="drawing" width="700"/>
+</p>
+
+同时，我们还采用了 [LongBench](https://github.com/THUDM/LongBench) 基准来评估长文档理解能力。InternLM2.5-7B-Chat-1M 在测试中相较于同类型的模型达到了最佳性能。
+
+<p align="center">
+<img src="https://github.com/libowen2121/InternLM/assets/19970308/1e8f7da8-8193-4def-8b06-0550bab6a12f" alt="drawing" width="800"/>
+</p>
+
+## 使用 InternLM2.5-1M 进行文档聊天
+
+下面介绍如何使用 InternLM2.5-7B-Chat-1M 来根据输入文档进行聊天。为了获得最佳性能，尤其是在处理长文本输入时，我们推荐使用 [LMDeploy](https://github.com/InternLM/LMDeploy) 来进行模型部署。
+
+### 支持的文件类型
+
+当前版本支持 PDF、TXT 和 Markdown 三类文件。未来我们将很快支持更多文件类型！
+
+- TXT 和 Markdown 文件：直接读取，无需转换。
+- PDF 文件：为了高效处理 PDF 文件，我们推出了轻量级的开源工具 [Magic-Doc](https://github.com/magicpdf/Magic-Doc) ，其可以将多种文件类型转换为 Markdown 格式。
+
+### 安装
+
+开始前，请安装所需的依赖：
+```bash
+pip install "fairy-doc[cpu]"
+pip install streamlit
+pip install lmdeploy
+```
+
+### 部署模型
+
+从 [model zoo](../README.md#model-zoo) 下载模型。
+
+通过以下命令部署模型。用户可以指定 `session-len`（sequence length）和 `server-port` 来定制模型推理。
+
+```bash
+lmdeploy serve api_server {path_to_hf_model} \
+--model-name internlm2-chat \
+--session-len 65536 \
+--server-port 8000
+```
+
+要进一步增加序列长度，建议添加以下参数：
+`--max-batch-size 1 --cache-max-entry-count 0.7 --tp {num_of_gpus}`
+
+### 启动 Streamlit demo
+
+```bash
+streamlit run long_context/doc_chat_demo.py \
+-- --base_url http://0.0.0.0:8000/v1
+```
+
+用户可以根据需要指定端口。如果在本地运行 demo，URL 可以是 `http://0.0.0.0:{your_port}/v1` 或 `http://localhost:{your_port}/v1`。对于云服务器，我们推荐使用 VSCode 来启动 demo，以实现无缝端口转发。
+
+对于长输入，我们建议使用以下参数：
+
+- Temperature: 0.05
+- Repetition penalty: 1.02
+
+当然，用户也可以根据需要在 web UI 中调整这些参数以获得最佳效果。
+
+下面是效果演示视频：
+
+https://github.com/libowen2121/InternLM/assets/19970308/1d7f9b87-d458-4f24-9f7a-437a4da3fa6e
+
+## 🔜 敬请期待更多
+
+我们将不断优化和更新长文本模型，以提升其在长文本上的理解和分析能力。敬请关注！
--- a/long_context/doc_chat_demo.py
+++ b/long_context/doc_chat_demo.py
@ -0,0 +1,144 @@
+import logging
+from dataclasses import dataclass
+import streamlit as st
+from openai import OpenAI
+from magic_doc.docconv import DocConverter
+import argparse
+
+# Set up logging
+logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
+
+@dataclass
+class GenerationConfig:
+    # this config is used for chat to provide more diversity
+    max_tokens: int = 1024
+    top_p: float = 1.0
+    temperature: float = 0.1
+    repetition_penalty: float = 1.005
+
+def generate(
+    client,
+    messages,
+    generation_config,
+):
+    stream = client.chat.completions.create(
+        model=st.session_state["model_name"],
+        messages=messages,
+        stream=True,
+        temperature=generation_config.temperature,
+        top_p=generation_config.top_p,
+        max_tokens=generation_config.max_tokens,
+        frequency_penalty=generation_config.repetition_penalty,
+    )
+    return stream
+
+def prepare_generation_config():
+    with st.sidebar:
+        max_tokens = st.number_input("Max Tokens", min_value=100, max_value=4096, value=1024)
+        top_p = st.number_input("Top P", 0.0, 1.0, 1.0, step=0.01)
+        temperature = st.number_input("Temperature", 0.0, 1.0, 0.05, step=0.01)
+        repetition_penalty = st.number_input("Repetition Penalty", 0.8, 1.2, 1.02, step=0.001, format="%0.3f")
+        st.button("Clear Chat History", on_click=on_btn_click)
+
+    generation_config = GenerationConfig(
+        max_tokens=max_tokens,
+        top_p=top_p,
+        temperature=temperature,
+        repetition_penalty=repetition_penalty
+    )
+
+    return generation_config
+
+def on_btn_click():
+    del st.session_state.messages
+    st.session_state.file_content_found = False
+    st.session_state.file_content_used = False
+
+user_avator = 'assets/user.png'
+robot_avator = 'assets/robot.png'
+
+st.title('InternLM2.5 File Chat 📝')
+
+def main(base_url):
+    # Initialize the client for the model
+    client = OpenAI(
+        base_url=base_url,
+        timeout=12000
+    )
+
+    # Get the model ID
+    model_name = client.models.list().data[0].id
+    st.session_state["model_name"] = model_name
+
+    # Get the generation config
+    generation_config = prepare_generation_config()
+
+    # Initialize session state
+    if "messages" not in st.session_state:
+        st.session_state.messages = []
+
+    if "file_content_found" not in st.session_state:
+        st.session_state.file_content_found = False
+        st.session_state.file_content_used = False
+        st.session_state.file_name = ""
+
+    # Handle file upload
+    if not st.session_state.file_content_found:
+        uploaded_file = st.file_uploader("Upload an article", type=("txt", "md", "pdf"))
+        file_content = ""
+        if uploaded_file is not None:
+            if uploaded_file.type == "application/pdf":
+                with open("uploaded_file.pdf", "wb") as f:
+                    f.write(uploaded_file.getbuffer())
+                converter = DocConverter(s3_config=None)
+                file_content, time_cost = converter.convert("uploaded_file.pdf", conv_timeout=300)
+                st.session_state.file_content_found = True  # Reset flag when a new file is uploaded
+                st.session_state.file_content = file_content  # Store the file content in session state
+                st.session_state.file_name = uploaded_file.name  # Store the file name in session state
+            else:
+                file_content = uploaded_file.read().decode("utf-8")
+                st.session_state.file_content_found = True  # Reset flag when a new file is uploaded
+                st.session_state.file_content = file_content  # Store the file content in session state
+                st.session_state.file_name = uploaded_file.name  # Store the file name in session state
+
+    if st.session_state.file_content_found:
+        st.success(f"File '{st.session_state.file_name}' has been successfully uploaded!")
+
+    # Display chat messages
+    for message in st.session_state.messages:
+        with st.chat_message(message["role"], avatar=message.get("avatar")):
+            st.markdown(message["content"])
+
+    # Handle user input and response generation
+    if prompt := st.chat_input("What's up?"):
+        turn = {"role": "user", "content": prompt, "avatar": user_avator}
+        if st.session_state.file_content_found and not st.session_state.file_content_used:
+            assert st.session_state.file_content is not None
+            merged_prompt = f"{st.session_state.file_content}\n\n{prompt}"
+            st.session_state.file_content_used = True  # Set flag to indicate file content has been used
+            turn["merged_content"] = merged_prompt
+
+        st.session_state.messages.append(turn)
+        with st.chat_message("user", avatar=user_avator):
+            st.markdown(prompt)
+
+        with st.chat_message("assistant", avatar=robot_avator):
+            messages = [
+                {
+                    "role": m["role"],
+                    "content": m["merged_content"] if "merged_content" in m else m["content"],
+                }
+                for m in st.session_state.messages
+            ]
+            # Log messages to the terminal
+            for m in messages:
+                logging.info(f"\n\n*** [{m['role']}] ***\n\n\t{m['content']}\n\n")
+            stream = generate(client, messages, generation_config)
+            response = st.write_stream(stream)
+        st.session_state.messages.append({"role": "assistant", "content": response, "avatar": robot_avator})
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Run Streamlit app with OpenAI client.")
+    parser.add_argument("--base_url", type=str, required=True, help="Base URL for the OpenAI client")
+    args = parser.parse_args()
+    main(args.base_url)