ColossalAI

Making large AI models cheaper, faster and more accessible

ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism

History

Elsa Granger b2ad0d9e8f [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017 ) * Use p2p * Cannot bidirectonal send p2p * Refactor tensor creation and serialization in P2P communication * Fix llama forward args in flash attention * Add flop estimate from megatron * Support loading weight not in weight_map when strict=False in hybrid_parallel * Use send_forward_recv_backward, etc in 1f1b * Use dataclass for metdata Remove torch.cuda.synchronize() as suggested * Add comment about the torch.cuda.synchronize for potential error * Typo * Update hybrid_parallel_checkpoint_io.py * Update p2p.py * Update one_f_one_b.py * Update p2p.py --------- Co-authored-by: flybird11111 <1829166702@qq.com>		1 year ago
..
__init__.py	[misc] update pre-commit and run all files (#4752 )	1 year ago
checkpoint_io_base.py	[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758 )	1 year ago
general_checkpoint_io.py	[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758 )	1 year ago
hybrid_parallel_checkpoint_io.py	[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017 )	1 year ago
index_file.py	[misc] update pre-commit and run all files (#4752 )	1 year ago
utils.py	[shardformer] Fix serialization error with Tensor Parallel state saving (#5018 )	1 year ago