diff --git a/ecosystem/README.md b/ecosystem/README.md new file mode 100644 index 0000000..12f9b75 --- /dev/null +++ b/ecosystem/README.md @@ -0,0 +1,220 @@ +# InternLM Ecosystem + +## Training + +### [XTuner](https://github.com/InternLM/xtuner) + +XTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models. + +You can find the best practice of finetuing the internlm2 model in the [README](https://github.com/InternLM/InternLM/tree/main/finetune#xtuner) + +### [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) + +LLaMA-Factory is an open-source, easy-to-use fine-tuning and training framework for LLMs + +### [swift](https://github.com/modelscope/swift) + +SWIFT supports training, inference, evaluation and deployment of LLMs and MLLMs (multimodal large models). + +## Inference + +### [LMDeploy](https://github.com/InternLM/lmdeploy) + +LMDeploy is an efficient toolkit for compressing, deploying, and serving LLMs and VLMs. + +With only 4 lines of codes, you can perform `internlm2-chat-7b` inference after `pip install lmdeploy`: + +```python +from lmdeploy import pipeline +pipe = pipeline("internlm/internlm2-chat-7b") +response = pipe(["Hi, pls intro yourself", "Shanghai is"]) +print(response) +``` + +### [vLLM](https://github.com/vllm-project/vllm) + +`vLLM` is a high-throughput and memory-efficient inference and serving engine for LLMs. + +After the installation via `pip install vllm`, you can conduct the `internlm2-chat-7b` model inference as follows: + +```python +from vllm import LLM, SamplingParams + +# Sample prompts. +prompts = [ + "Hello, my name is", + "The future of AI is", +] +# Create a sampling params object. +sampling_params = SamplingParams(temperature=0.8, top_p=0.95) + +# Create an LLM. +llm = LLM(model="internlm/internlm2-chat-7b") +# Generate texts from the prompts. The output is a list of RequestOutput objects +# that contain the prompt, generated text, and other information. +outputs = llm.generate(prompts, sampling_params) +# Print the outputs. +for output in outputs: + prompt = output.prompt + generated_text = output.outputs[0].text + print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") +``` + +### [TGI](https://github.com/huggingface/text-generation-inference) + +TGI is a toolkit for deploying and serving Large Language Models (LLMs). The easiest way of deploying a LLM is using the official Docker container: + +```python +model=internlm/internlm2-chat-7b +volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run + +docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model +``` + +And then you can make requests like + +```shell +curl 127.0.0.1:8080/generate_stream \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ + -H 'Content-Type: application/json' +``` + +### [llama.cpp](https://github.com/ggerganov/llama.cpp) + +`llama.cpp` is a LLM inference framework developed in C/C++. Its goal is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. + +`InternLM2` can be deployed with `llama.cpp` by following the below instructions: + +- Refer [this](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build) guide to build llama.cpp from source +- Convert the InternLM model to GGUF model and run it according to the [guide](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize) + +### [ollama](https://github.com/ollama/ollama) + +Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, enabling users to easily set up and execute LLMs locally (in CPU and GPU modes). + +The following snippet presents the Modefile of InternLM2 with `internlm2-chat-7b` as an example. Note that the InternLM2 model has to be converted to GGUF model at first. + +```shell +echo 'FROM ./internlm2-chat-7b.gguf +TEMPLATE """{{ if .System }}<|im_start|>system +{{ .System }}<|im_end|> +{{ end }}{{ if .Prompt }}<|im_start|>user +{{ .Prompt }} +{{ end }}<|im_start|>assistant +{{ .Response }}<|im_end|>""" + +PARAMETER stop "<|action_end|>" +PARAMETER stop "<|im_end|>" + +SYSTEM """You are an AI assistant whose name is InternLM (书生·浦语). +- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless. +- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文. +""" +' > ./Modelfile +``` + +Then, create an image from the above `Modelfile` like this: + +```shell +ollama create internlm2:chat-7b -f ./Modelfile +``` + +Regarding the usage of `ollama`, please refer [here](https://github.com/ollama/ollama/tree/main/docs). + +### [llamafile](https://github.com/Mozilla-Ocho/llamafile) + +llamafile lets you turn large language model (LLM) weights into executables. It combines [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan). + +The best practice of deploying InternLM2 using llamafile is shown as below: + +- Convert the internlm2 model into GGUF model by `llama.cpp`. Suppose we get `internlm2-chat-7b.gguf` in this step +- create the llamafile + +```shell +wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.6/llamafile-0.8.6.zip +unzip llamafile-0.8.6.zip + +cp llamafile-0.8.6/bin/llamafile internlm2.llamafile + +echo "-m +internlm2-chat-7b.gguf +--host +0.0.0.0 +-ngl +999 +..." > .args + +zipalign -j0 \ + internlm2.llamafile \ + internlm2-chat-7b.gguf \ + .args + +rm -rf .args +``` + +- Run the llamafile + +```shell +./internlm2.llamafile +``` + +Your browser should open automatically and display a chat interface. (If it doesn't, just open your browser and point it at http://localhost:8080) + +### [mlx](https://github.com/ml-explore/mlx) + +MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. + +With the following steps, you can perform InternLM2 inference on Apple devices. + +- Installation + +```shell +pip install mlx mlx-lm +``` + +- Inference + +```python +from mlx_lm import load, generate +tokenizer_config = {"trust_remote_code": True} +model, tokenizer = load("internlm/internlm2-chat-1_8b", tokenizer_config=tokenizer_config) +response = generate(model, tokenizer, prompt="write a story", verbose=True) +``` + +## Application + +### [Langchain](https://github.com/langchain-ai/langchain) + +LangChain is a framework for developing applications powered by large language models (LLMs). + +You can build a [LLM chain](https://python.langchain.com/v0.1/docs/get_started/quickstart/#llm-chain) by the OpenAI API. And the server is recommended to be launched by LMDeploy, vLLM or others that are compatible with openai server. + +```python +from langchain_openai import ChatOpenAI +from langchain_core.prompts import ChatPromptTemplate + +llm = ChatOpenAI( + api_key="a dummy key", + base_ur='https://0.0.0.0:23333/v1') +prompt = ChatPromptTemplate.from_messages([ + ("system", "You are a world class technical documentation writer."), + ("user", "{input}") +]) + +chain = prompt | llm + +chain.invoke({"input": "how can langsmith help with testing?"}) +``` + +Or you can follow the guide [here](https://python.langchain.com/v0.1/docs/get_started/quickstart/#llm-chain) and run an ollama model locally. + +As for other user cases, please look for them from [here](https://python.langchain.com/v0.1/docs/get_started/introduction/). + +### [LlamaIndex](https://github.com/run-llama/llama_index) + +LlamaIndex is a framework for building context-augmented LLM applications. + +It chooses ollama as the LLM inference engine locally. An example can be found from the [Starter Tutorial(Local Models)](https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/). + +Therefore, you can integrate InternLM2 to LlamaIndex smoothly if you can deploying InternLM2 with `ollama` as guided in the [ollama section](#ollama)