ColossalAI/colossalai/utils/rank_recorder/README.md

# Rank Recorder
This is a useful tool to get the records of certain functions in each rank. The records of each rank will dump into a json file after the end of multiple process program. You can parse and visualize the json file easily.

Before using the tool, you should ensure dist.is_initialized() return true before exit of program.

## Usage

Is very simple:

```python
from colossalai.utils.rank_recorder import recorder

...
...

with recorder(record_name, current_rank) as r:
    """procedure to record
    """

```

## Example
This is a demo to display kernel select in cuda and visualize the cost of several procedures in each rank.

```python
import time
import os
import logging
logging.disable(logging.INFO)

import torch
import torch.distributed as dist
import torch.multiprocessing as mp

from colossalai.utils.rank_recorder import recorder


WORLD_SIZE = 4

# config the export image here
# If you want to dive into the detail, format 'svg' is recommended
recorder.export_format = 'png'
recorder.export_name = 'kernel_select'
recorder.dpi = 500

def calc(x, y):
    a = torch.randn(x, y).cuda()
    b = torch.randn(x, y).cuda()
    c = sum(a * b)
    return c

def worker(rank):
    os.environ['MASTER_ADDR'] = 'localhost'
    os.environ['MASTER_PORT'] = '29020'
    dist.init_process_group(backend='nccl', world_size=WORLD_SIZE, rank=rank)
    print(dist.get_rank(), "enter")
    time.sleep(0.1 * rank)

    with recorder("calc_1(x100)", rank) as r:
        calc(100, 100)

    with recorder("calc_2(x400)", rank) as r:
        calc(400, 400)

    with recorder("calc_2(x200)", rank) as r:
        calc(200, 200)

if __name__ == "__main__":
    mp.spawn(worker, nprocs=WORLD_SIZE)
```

run the script directly and you will get `kernel_select.json` and `kernel_select.png` in your current folder.
[pipeline/rank_recorder] fix bug when process data before backward \| add a tool for multiple ranks debug (#1681) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward 2 years ago			`# Rank Recorder`
fix typo with colossalai/trainer utils zero (#3908) 1 year ago			`This is a useful tool to get the records of certain functions in each rank. The records of each rank will dump into a json file after the end of multiple process program. You can parse and visualize the json file easily.`
[pipeline/rank_recorder] fix bug when process data before backward \| add a tool for multiple ranks debug (#1681) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward 2 years ago
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`Before using the tool, you should ensure dist.is_initialized() return true before exit of program.`
[pipeline/rank_recorder] fix bug when process data before backward \| add a tool for multiple ranks debug (#1681) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward 2 years ago
			`## Usage`

			`Is very simple:`

			```python
			`from colossalai.utils.rank_recorder import recorder`

			`...`
			`...`

			`with recorder(record_name, current_rank) as r:`
			`"""procedure to record`
			`"""`

			```

			`## Example`
fix typo with colossalai/trainer utils zero (#3908) 1 year ago			`This is a demo to display kernel select in cuda and visualize the cost of several procedures in each rank.`
[pipeline/rank_recorder] fix bug when process data before backward \| add a tool for multiple ranks debug (#1681) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward 2 years ago
			```python
			`import time`
			`import os`
			`import logging`
			`logging.disable(logging.INFO)`

			`import torch`
			`import torch.distributed as dist`
			`import torch.multiprocessing as mp`

			`from colossalai.utils.rank_recorder import recorder`


			`WORLD_SIZE = 4`

			`# config the export image here`
			`# If you want to dive into the detail, format 'svg' is recommended`
			`recorder.export_format = 'png'`
			`recorder.export_name = 'kernel_select'`
			`recorder.dpi = 500`

			`def calc(x, y):`
			`a = torch.randn(x, y).cuda()`
			`b = torch.randn(x, y).cuda()`
			`c = sum(a * b)`
			`return c`

			`def worker(rank):`
			`os.environ['MASTER_ADDR'] = 'localhost'`
			`os.environ['MASTER_PORT'] = '29020'`
			`dist.init_process_group(backend='nccl', world_size=WORLD_SIZE, rank=rank)`
			`print(dist.get_rank(), "enter")`
			`time.sleep(0.1 * rank)`

			`with recorder("calc_1(x100)", rank) as r:`
			`calc(100, 100)`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago
[pipeline/rank_recorder] fix bug when process data before backward \| add a tool for multiple ranks debug (#1681) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward 2 years ago			`with recorder("calc_2(x400)", rank) as r:`
			`calc(400, 400)`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago
[pipeline/rank_recorder] fix bug when process data before backward \| add a tool for multiple ranks debug (#1681) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward 2 years ago			`with recorder("calc_2(x200)", rank) as r:`
			`calc(200, 200)`

			`if __name__ == "__main__":`
			`mp.spawn(worker, nprocs=WORLD_SIZE)`
			```

[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			run the script directly and you will get `kernel_select.json` and `kernel_select.png` in your current folder.