ColossalAI/colossalai/inference
yuehuayingxueluo 10e3c9f923 rm torch.cuda.synchronize 2024-01-11 13:46:14 +00:00
..
core rm torch.cuda.synchronize 2024-01-11 13:46:14 +00:00
kv_cache adapted to pad_context_forward 2024-01-11 13:44:06 +00:00
modeling fix CI bugs 2024-01-11 13:46:14 +00:00
__init__.py [Inference] First PR for rebuild colossal-infer (#5143) 2024-01-11 13:39:29 +00:00
config.py adapted to pad_context_forward 2024-01-11 13:44:06 +00:00
logit_processors.py [Inference] add logit processor and request handler (#5166) 2024-01-11 13:39:56 +00:00
readme.md [Inference]Update inference config and fix test (#5178) 2024-01-11 13:39:29 +00:00
sampler.py adapted to pad_context_forward 2024-01-11 13:44:06 +00:00
struct.py adapted to pad_context_forward 2024-01-11 13:44:06 +00:00

readme.md

Colossal-Infer

Introduction

Colossal-Infer is a library for inference of LLMs and MLMs. It is built on top of Colossal AI.

Structures

Overview

The main design will be released later on.

Roadmap

  • [] design of structures
  • [] Core components
    • [] engine
    • [] request handler
    • [] kv cache manager
    • [] modeling
    • [] custom layers
    • [] online server
  • [] supported models
    • [] llama2