mirror of https://github.com/hpcaitech/ColossalAI
[doc] add tutorial for cluster utils (#3763)
* [doc] add en cluster utils doc * [doc] add zh cluster utils doc * [doc] add cluster utils doc in sidebarpull/3780/head^2
parent
5452df63c5
commit
5ce6c9d86f
|
@ -58,7 +58,8 @@
|
|||
]
|
||||
},
|
||||
"features/pipeline_parallel",
|
||||
"features/nvme_offload"
|
||||
"features/nvme_offload",
|
||||
"features/cluster_utils"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -0,0 +1,32 @@
|
|||
# Cluster Utilities
|
||||
|
||||
Author: [Hongxin Liu](https://github.com/ver217)
|
||||
|
||||
**Prerequisite:**
|
||||
- [Distributed Training](../concepts/distributed_training.md)
|
||||
|
||||
## Introduction
|
||||
|
||||
We provide a utility class `colossalai.cluster.DistCoordinator` to coordinate distributed training. It's useful to get various information about the cluster, such as the number of nodes, the number of processes per node, etc.
|
||||
|
||||
## API Reference
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.is_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.is_node_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.is_last_process }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_node_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.priority_execution }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.destroy }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.block_all }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.on_master_only }}
|
|
@ -0,0 +1,32 @@
|
|||
# 集群实用程序
|
||||
|
||||
作者: [Hongxin Liu](https://github.com/ver217)
|
||||
|
||||
**前置教程:**
|
||||
- [分布式训练](../concepts/distributed_training.md)
|
||||
|
||||
## 引言
|
||||
|
||||
我们提供了一个实用程序类 `colossalai.cluster.DistCoordinator` 来协调分布式训练。它对于获取有关集群的各种信息很有用,例如节点数、每个节点的进程数等。
|
||||
|
||||
## API 参考
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.is_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.is_node_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.is_last_process }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.print_on_node_master }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.priority_execution }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.destroy }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.block_all }}
|
||||
|
||||
{{ autodoc:colossalai.cluster.DistCoordinator.on_master_only }}
|
Loading…
Reference in New Issue