Making large AI models cheaper, faster and more accessible
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
superhao1995 f660152c73 [NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style (#1258) 2 years ago
..
amp [hotfix]different overflow status lead to communication stuck. (#1175) 2 years ago
builder [pipeline] refactor the pipeline module (#1087) 2 years ago
cli [hotfix] fix some bugs caused by size mismatch. (#1011) 3 years ago
communication [hotfix]fixed p2p process send stuck (#1181) 2 years ago
context [usability] improved error messages in the context module (#856) 3 years ago
engine [NFC] polish colossalai/engine/ophooks/utils.py code style (#1256) 2 years ago
fx [fx] added ndim property to proxy (#1253) 2 years ago
gemini make AutoPlacementPolicy configurable (#1191) 2 years ago
kernel [optim] refactor fused sgd (#1134) 2 years ago
logging [doc] improved docstring in the logging module (#861) 3 years ago
nn [NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style (#1258) 2 years ago
pipeline [pipeline]add customized policy (#1139) 2 years ago
registry Remove duplication registry (#1078) 2 years ago
tensor [hotfix] Dist Mgr gather torch version (#1284) 2 years ago
testing [test] skip tests when not enough GPUs are detected (#1090) 2 years ago
trainer fix issue #1080 (#1071) 2 years ago
utils [tensor] distributed checkpointing for parameters (#1240) 2 years ago
zero [hotfix] fix sharded optim step and clip_grad_norm (#1226) 2 years ago
__init__.py [NFC] polish __init__.py code style (#965) 3 years ago
constants.py fix typo in constants (#1027) 3 years ago
core.py [Tensor] distributed view supports inter-process hybrid parallel (#1169) 2 years ago
global_variables.py
initialize.py [ddp] supported customized torch ddp configuration (#1123) 2 years ago