ColossalAI/examples/inference
Zhongkai Zhao 79c4bff452
[doc] Update the user guide and the development document in Colossal-Inference (#5086)
* doc/ Update the user guide and the development document

* update quick start example and readme
2023-11-21 18:58:04 +08:00
..
README.md [doc] Update the user guide and the development document in Colossal-Inference (#5086) 2023-11-21 18:58:04 +08:00
benchmark_llama.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
benchmark_vllm.py [example] add vllm inference benchmark (#5080) 2023-11-21 15:10:57 +08:00
build_smoothquant_weight.py [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00
example.py [doc] Update the user guide and the development document in Colossal-Inference (#5086) 2023-11-21 18:58:04 +08:00
run_benchmark.sh [inference] refactor examples and fix schedule (#5077) 2023-11-21 10:46:03 +08:00

README.md

Colossal-Inference

Table of Contents

📚 Introduction

This example lets you to set up and quickly try out our Colossal-Inference.

🔨 Installation

Install From Source

Prerequistes:

  • Python == 3.9
  • PyTorch >= 2.1.0
  • CUDA == 11.8
  • Linux OS

We strongly recommend you use Anaconda to create a new environment (Python >= 3.9) to run our examples:

# Create a new conda environment
conda create -n inference python=3.9 -y
conda activate inference

Install the latest PyTorch (with CUDA == 11.8) using conda:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install Colossal-AI from source:

# Clone Colossal-AI repository to your workspace
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# Install Colossal-AI from source
pip install .

Install inference dependencies:

# Install inference dependencies
pip install -r requirements/requirements-infer.txt

(Optional) If you want to use SmoothQuant quantization, you need to install torch-int following this instruction.

Use Colossal-Inference in Docker

Pull from DockerHub

You can directly pull the docker image from our DockerHub page. The image is automatically uploaded upon release.

docker pull hpcaitech/colossal-inference:latest

Build On Your Own

Run the following command to build a docker image from Dockerfile provided.

cd ColossalAI/inference/dokcer
docker build

Run the following command to start the docker container in interactive mode.

docker run -it --gpus all --name Colossal-Inference -v $PWD:/workspace -w /workspace hpcaitech/colossal-inference:latest /bin/bash
Todo

🚀 Quick Start

You can try the inference example using Colossal-LLaMA-2-7B following the instructions below:

cd ColossalAI/examples/inference
python example.py -m hpcai-tech/Colossal-LLaMA-2-7b-base -b 4 --max_input_len 128 --max_output_len 64 --dtype fp16

Examples for quantized inference will coming soon!

💡 Usage

A general way to use Colossal-Inference will be:

# Import required modules
import ...

# Prepare your model
model = ...

# Declare configurations
tp_size = ...
pp_size = ...
...

# Create an inference engine
engine = InferenceEngine(model, [tp_size, pp_size, ...])

# Tokenize the input
inputs = ...

# Perform inferencing based on the inputs
outputs = engine.generate(inputs)