check in initial version of internlm ecosystem

2024-06-11 17:58:15 +08:00 · 2024-06-11 17:58:15 +08:00 · 2621b3e8f6
parent 3be5894976
commit 2621b3e8f6
1 changed files with 220 additions and 0 deletions
--- a/ecosystem/README.md
+++ b/ecosystem/README.md
@ -0,0 +1,220 @@
+# InternLM Ecosystem
+
+## Training
+
+### [XTuner](https://github.com/InternLM/xtuner)
+
+XTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models.
+
+You can find the best practice of finetuing the internlm2 model in the [README](https://github.com/InternLM/InternLM/tree/main/finetune#xtuner)
+
+### [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
+
+LLaMA-Factory is an open-source, easy-to-use fine-tuning and training framework for LLMs
+
+### [swift](https://github.com/modelscope/swift)
+
+SWIFT supports training, inference, evaluation and deployment of LLMs and MLLMs (multimodal large models).
+
+## Inference
+
+### [LMDeploy](https://github.com/InternLM/lmdeploy)
+
+LMDeploy is an efficient toolkit for compressing, deploying, and serving LLMs and VLMs.
+
+With only 4 lines of codes, you can perform `internlm2-chat-7b` inference after `pip install lmdeploy`:
+
+```python
+from lmdeploy import pipeline
+pipe = pipeline("internlm/internlm2-chat-7b")
+response = pipe(["Hi, pls intro yourself", "Shanghai is"])
+print(response)
+```
+
+### [vLLM](https://github.com/vllm-project/vllm)
+
+`vLLM` is a high-throughput and memory-efficient inference and serving engine for LLMs.
+
+After the installation via `pip install vllm`, you can conduct the `internlm2-chat-7b` model inference as follows:
+
+```python
+from vllm import LLM, SamplingParams
+
+# Sample prompts.
+prompts = [
+    "Hello, my name is",
+    "The future of AI is",
+]
+# Create a sampling params object.
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+
+# Create an LLM.
+llm = LLM(model="internlm/internlm2-chat-7b")
+# Generate texts from the prompts. The output is a list of RequestOutput objects
+# that contain the prompt, generated text, and other information.
+outputs = llm.generate(prompts, sampling_params)
+# Print the outputs.
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+
+### [TGI](https://github.com/huggingface/text-generation-inference)
+
+TGI is a toolkit for deploying and serving Large Language Models (LLMs). The easiest way of deploying a LLM is using the official Docker container:
+
+```python
+model=internlm/internlm2-chat-7b
+volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
+
+docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model
+```
+
+And then you can make requests like
+
+```shell
+curl 127.0.0.1:8080/generate_stream \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
+    -H 'Content-Type: application/json'
+```
+
+### [llama.cpp](https://github.com/ggerganov/llama.cpp)
+
+`llama.cpp` is a LLM inference framework developed in C/C++. Its goal is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
+
+`InternLM2` can be deployed with `llama.cpp` by following the below instructions:
+
+- Refer [this](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build) guide to build llama.cpp from source
+- Convert the InternLM model to GGUF model and run it according to the [guide](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize)
+
+### [ollama](https://github.com/ollama/ollama)
+
+Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, enabling users to easily set up and execute LLMs locally (in CPU and GPU modes).
+
+The following snippet presents the Modefile of InternLM2 with `internlm2-chat-7b` as an example. Note that the InternLM2 model has to be converted to GGUF model at first.
+
+```shell
+echo 'FROM ./internlm2-chat-7b.gguf
+TEMPLATE """{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>
+{{ end }}{{ if .Prompt }}<|im_start|>user
+{{ .Prompt }}<im_end>
+{{ end }}<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+
+PARAMETER stop "<|action_end|>"
+PARAMETER stop "<|im_end|>"
+
+SYSTEM """You are an AI assistant whose name is InternLM (书生·浦语).
+- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
+- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
+"""
+' > ./Modelfile
+```
+
+Then, create an image from the above `Modelfile` like this:
+
+```shell
+ollama create internlm2:chat-7b -f ./Modelfile
+```
+
+Regarding the usage of `ollama`, please refer [here](https://github.com/ollama/ollama/tree/main/docs).
+
+### [llamafile](https://github.com/Mozilla-Ocho/llamafile)
+
+llamafile lets you turn large language model (LLM) weights into executables. It combines [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan).
+
+The best practice of deploying InternLM2 using llamafile is shown as below:
+
+- Convert the internlm2 model into GGUF model by `llama.cpp`. Suppose we get `internlm2-chat-7b.gguf` in this step
+- create the llamafile
+
+```shell
+wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.6/llamafile-0.8.6.zip
+unzip llamafile-0.8.6.zip
+
+cp llamafile-0.8.6/bin/llamafile internlm2.llamafile
+
+echo "-m
+internlm2-chat-7b.gguf
+--host
+0.0.0.0
+-ngl
+999
+..." > .args
+
+zipalign -j0 \
+  internlm2.llamafile \
+  internlm2-chat-7b.gguf \
+  .args
+
+rm -rf .args
+```
+
+- Run the llamafile
+
+```shell
+./internlm2.llamafile
+```
+
+Your browser should open automatically and display a chat interface. (If it doesn't, just open your browser and point it at http://localhost:8080)
+
+### [mlx](https://github.com/ml-explore/mlx)
+
+MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research.
+
+With the following steps, you can perform InternLM2 inference on Apple devices.
+
+- Installation
+
+```shell
+pip install mlx mlx-lm
+```
+
+- Inference
+
+```python
+from mlx_lm import load, generate
+tokenizer_config = {"trust_remote_code": True}
+model, tokenizer = load("internlm/internlm2-chat-1_8b", tokenizer_config=tokenizer_config)
+response = generate(model, tokenizer, prompt="write a story", verbose=True)
+```
+
+## Application
+
+### [Langchain](https://github.com/langchain-ai/langchain)
+
+LangChain is a framework for developing applications powered by large language models (LLMs).
+
+You can build a [LLM chain](https://python.langchain.com/v0.1/docs/get_started/quickstart/#llm-chain) by the OpenAI API. And the server is recommended to be launched by LMDeploy, vLLM or others that are compatible with openai server.
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import ChatPromptTemplate
+
+llm = ChatOpenAI(
+    api_key="a dummy key",
+    base_ur='https://0.0.0.0:23333/v1')
+prompt = ChatPromptTemplate.from_messages([
+    ("system", "You are a world class technical documentation writer."),
+    ("user", "{input}")
+])
+
+chain = prompt | llm
+
+chain.invoke({"input": "how can langsmith help with testing?"})
+```
+
+Or you can follow the guide [here](https://python.langchain.com/v0.1/docs/get_started/quickstart/#llm-chain) and run an ollama model locally.
+
+As for other user cases, please look for them from [here](https://python.langchain.com/v0.1/docs/get_started/introduction/).
+
+### [LlamaIndex](https://github.com/run-llama/llama_index)
+
+LlamaIndex is a framework for building context-augmented LLM applications.
+
+It chooses ollama as the LLM inference engine locally. An example can be found from the [Starter Tutorial(Local Models)](https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/).
+
+Therefore, you can integrate InternLM2 to LlamaIndex smoothly if you can deploying InternLM2 with `ollama` as guided in the [ollama section](#ollama)