colossalai>=0.3.2 datasets numpy torch>=1.12.0,<=2.0.0 tqdm transformers flash-attn>=2.0.0,<=2.0.5 SentencePiece==0.1.99 tensorboard==2.14.0