History

binmakeswell 6df844b8c4 [release] grok-1 314b inference (#5490 ) * [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference		2024-03-22 15:48:12 +08:00
..
README.md	[release] grok-1 314b inference (#5490 )	2024-03-22 15:48:12 +08:00
grok1_policy.py	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
inference.py	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
inference_tp.py	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
requirements.txt	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
run_inference_fast.sh	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
run_inference_slow.sh	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
test_ci.sh	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00
utils.py	[example] add grok-1 inference (#5485 )	2024-03-21 18:07:22 +08:00

README.md

Grok-1 Inference

An easy-to-use Python + PyTorch + HuggingFace version of 314B Grok-1. [code] [blog] [HuggingFace Grok-1 PyTorch model weights]

Install

# Make sure you install colossalai from the latest source code
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
pip install .
cd examples/language/grok-1
pip install -r requirements.txt

Tokenizer preparation

You should download the tokenizer from the official grok-1 repository.

wget https://github.com/xai-org/grok-1/raw/main/tokenizer.model

Inference

You need 8x A100 80GB or equivalent GPUs to run the inference.

We provide two scripts for inference. run_inference_fast.sh uses tensor parallelism provided by ColossalAI, and it is faster. run_inference_slow.sh uses auto device provided by transformers, and it is slower.

Command format:

./run_inference_fast.sh <model_name_or_path> <tokenizer_path>
./run_inference_slow.sh <model_name_or_path> <tokenizer_path>

model_name_or_path can be a local path or a model name from Hugging Face model hub. We provided weights on model hub, named hpcaitech/grok-1.

Command example:

./run_inference_fast.sh hpcaitech/grok-1 tokenizer.model

It will take 5-10 minutes to load checkpoints. Don't worry, it's not stuck.