mirror of https://github.com/hpcaitech/ColossalAI
[doc] add tutorial for cluster utils (#3763)
* [doc] add en cluster utils doc * [doc] add zh cluster utils doc * [doc] add cluster utils doc in sidebarpull/3780/head^2
parent
5452df63c5
commit
5ce6c9d86f
|
@ -58,7 +58,8 @@
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"features/pipeline_parallel",
|
"features/pipeline_parallel",
|
||||||
"features/nvme_offload"
|
"features/nvme_offload",
|
||||||
|
"features/cluster_utils"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
|
@ -0,0 +1,32 @@
|
||||||
|
# Cluster Utilities
|
||||||
|
|
||||||
|
Author: [Hongxin Liu](https://github.com/ver217)
|
||||||
|
|
||||||
|
**Prerequisite:**
|
||||||
|
- [Distributed Training](../concepts/distributed_training.md)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
We provide a utility class `colossalai.cluster.DistCoordinator` to coordinate distributed training. It's useful to get various information about the cluster, such as the number of nodes, the number of processes per node, etc.
|
||||||
|
|
||||||
|
## API Reference
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.is_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.is_node_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.is_last_process }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_node_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.priority_execution }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.destroy }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.block_all }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.on_master_only }}
|
|
@ -0,0 +1,32 @@
|
||||||
|
# 集群实用程序
|
||||||
|
|
||||||
|
作者: [Hongxin Liu](https://github.com/ver217)
|
||||||
|
|
||||||
|
**前置教程:**
|
||||||
|
- [分布式训练](../concepts/distributed_training.md)
|
||||||
|
|
||||||
|
## 引言
|
||||||
|
|
||||||
|
我们提供了一个实用程序类 `colossalai.cluster.DistCoordinator` 来协调分布式训练。它对于获取有关集群的各种信息很有用,例如节点数、每个节点的进程数等。
|
||||||
|
|
||||||
|
## API 参考
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.is_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.is_node_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.is_last_process }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_node_master }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.priority_execution }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.destroy }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.block_all }}
|
||||||
|
|
||||||
|
{{ autodoc:colossalai.cluster.DistCoordinator.on_master_only }}
|
Loading…
Reference in New Issue