ColossalAI/colossalai/utils/rank_recorder
Kirigaya Kazuto 3b2a59b0ba
[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-10-09 17:32:57 +08:00
..
README.md [pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) 2022-10-09 17:32:57 +08:00
__init__.py [pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) 2022-10-09 17:32:57 +08:00
rank_recorder.py [pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) 2022-10-09 17:32:57 +08:00

README.md

Rank Recorder

This is a useful tool to get the records of certain functions in each rank. The records of each rank will dump into a json file after the end of multiple process program. You can parse and visualise the json file easily.

Before using the tool, you should ensure dist.is_initialized() return true before exit of program.

Usage

Is very simple:

from colossalai.utils.rank_recorder import recorder

...
...

with recorder(record_name, current_rank) as r:
    """procedure to record
    """

Example

This is a demo to display kernel select in cuda and visualise the cost of several procedures in each rank.

import time
import os
import logging
logging.disable(logging.INFO)

import torch
import torch.distributed as dist
import torch.multiprocessing as mp

from colossalai.utils.rank_recorder import recorder


WORLD_SIZE = 4

# config the export image here
# If you want to dive into the detail, format 'svg' is recommended
recorder.export_format = 'png'
recorder.export_name = 'kernel_select'
recorder.dpi = 500

def calc(x, y):
    a = torch.randn(x, y).cuda()
    b = torch.randn(x, y).cuda()
    c = sum(a * b)
    return c

def worker(rank):
    os.environ['MASTER_ADDR'] = 'localhost'
    os.environ['MASTER_PORT'] = '29020'
    dist.init_process_group(backend='nccl', world_size=WORLD_SIZE, rank=rank)
    print(dist.get_rank(), "enter")
    time.sleep(0.1 * rank)

    with recorder("calc_1(x100)", rank) as r:
        calc(100, 100)
    
    with recorder("calc_2(x400)", rank) as r:
        calc(400, 400)
    
    with recorder("calc_2(x200)", rank) as r:
        calc(200, 200)

if __name__ == "__main__":
    mp.spawn(worker, nprocs=WORLD_SIZE)

run the script directly and you will get kernel_select.json and kernel_select.png in your current folder.