colossalai>=0.3.6 datasets numpy tqdm transformers flash-attn>=2.0.0 SentencePiece==0.1.99 tensorboard==2.14.0