ColossalAI/examples/language/grok-1
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
..
README.md [fix] fix grok-1 example typo (#5506) 2024-03-26 10:19:42 +08:00
grok1_policy.py [example] add grok-1 inference (#5485) 2024-03-21 18:07:22 +08:00
inference.py [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) 2024-04-07 19:04:58 +08:00
inference_tp.py [misc] refactor launch API and tensor constructor (#5666) 2024-04-29 10:40:11 +08:00
requirements.txt [example] update Grok-1 inference (#5495) 2024-03-24 20:24:11 +08:00
run_inference_fast.sh [fix] fix grok-1 example typo (#5506) 2024-03-26 10:19:42 +08:00
run_inference_slow.sh [fix] fix grok-1 example typo (#5506) 2024-03-26 10:19:42 +08:00
test_ci.sh [example] add grok-1 inference (#5485) 2024-03-21 18:07:22 +08:00
utils.py [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) 2024-04-07 19:04:58 +08:00

README.md

Grok-1 Inference

  • 314 Billion Parameter Grok-1 Inference Accelerated by 3.8x, an easy-to-use Python + PyTorch + HuggingFace version for Inference.

[code] [blog] [HuggingFace Grok-1 PyTorch model weights] [ModelScope Grok-1 PyTorch model weights]

Installation

# Make sure you install colossalai from the latest source code
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
pip install .
cd examples/language/grok-1
pip install -r requirements.txt

Inference

You need 8x A100 80GB or equivalent GPUs to run the inference.

We provide two scripts for inference. run_inference_fast.sh uses tensor parallelism provided by ColossalAI, which is faster for generation, while run_inference_slow.sh uses auto device provided by transformers, which is relatively slower.

Command example:

./run_inference_fast.sh <MODEL_NAME_OR_PATH>
./run_inference_slow.sh <MODEL_NAME_OR_PATH>

MODEL_NAME_OR_PATH can be a model name from Hugging Face model hub or a local path to PyTorch-version model checkpoints. We have provided pytorch-version checkpoint on HuggingFace model hub, named hpcai-tech/grok-1. And you could also download the weights in advance using git:

git lfs install
git clone https://huggingface.co/hpcai-tech/grok-1

It will take, depending on your Internet speed, several hours to tens of hours to download checkpoints (about 600G!), and 5-10 minutes to load checkpoints when it's ready to launch the inference. Don't worry, it's not stuck.

Performance

For request of batch size set to 1 and maximum length set to 100:

Method Initialization-Duration(sec) Average-Generation-Latency(sec)
ColossalAI 431.45 14.92
HuggingFace Auto-Device 426.96 48.38
JAX 147.61 56.25

Tested on 8x80G NVIDIA H800.