docs(doc/code-docs): refine profiler docs (#295)

* add detailed profiler guide * added torch profiler detailed docs * add english docs for profiler page * docs(code-docs/source/profiler.rst): resize profiler trace image * docs(code-docs/source/profiler.rst): fix typo * docs(doc/imgs/torch_profiler_trace.png): update trace image
2023-09-08 16:58:36 +08:00 · 2023-09-08 16:58:36 +08:00 · 06807a6fd5
parent 0423426c4c
commit 06807a6fd5
6 changed files with 219 additions and 77 deletions
--- a/doc/code-docs/locales/en/LC_MESSAGES/initialize.po
+++ b/doc/code-docs/locales/en/LC_MESSAGES/initialize.po
@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: InternLM \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2023-09-07 14:15+0800\n"
+"POT-Creation-Date: 2023-09-08 15:32+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@ -19,26 +19,29 @@ msgstr ""
 "Content-Transfer-Encoding: 8bit\n"
 "Generated-By: Babel 2.12.1\n"

-#: ../../source/initialize.rst:2 b829330eebd24620b745072bbfc26c98
+#: ../../source/initialize.rst:2
 msgid "训练构建"
 msgstr "Training Setup"

-#: ../../source/initialize.rst:7 8c8472b4647a4de8998d75b9ec6f09eb
+#: ../../source/initialize.rst:7
 msgid "命令行参数解析"
 msgstr "Argument Parsing"

-#: ../../source/initialize.rst:8 f74176fa4aee4bbfaf989ffab9283ee7
+#: ../../source/initialize.rst:9
+#, fuzzy
 msgid ""
 "InternLM 使用 `argparse <https://docs.python.org/3/library/argparse.html>`_"
-" 库来向InternLM运行时提供命令行参数配置。用户可 使用 "
+" 库来向InternLM运行时提供命令行参数配置。用户可使用 "
 "``internlm.initialize.get_default_parser()`` 来获取 InternLM "
 "的默认解析器，其中包含一些内置参数，用户可以向此解析器添加自定义参数。"
 msgstr ""
-"InternLM uses the `argparse <https://docs.python.org/3/library/argparse.html>`_ library to supply commandline "
-"configuration to the InternLM runtime. Use ``internlm.initialize.get_default_parser()`` to get InternLM's default "
-"parser with some builtin arguments, users can add custom parameters to this parser."
+"InternLM uses the `argparse "
+"<https://docs.python.org/3/library/argparse.html>`_ library to supply "
+"commandline configuration to the InternLM runtime. Use "
+"``internlm.initialize.get_default_parser()`` to get InternLM's default "
+"parser with some builtin arguments, users can add custom parameters to "
+"this parser."

-#: 9930855b85bf41ed8712fc40e1e034f7
 #: internlm.initialize.launch.get_default_parser:1 of
 msgid ""
 "Reads user command line and uses an argument parser to parse the input "
@ -46,9 +49,6 @@ msgid ""
 " local rank, backend for torch.distributed."
 msgstr ""

-#: 015003b013e346bea15b4514f2001a25 544472c2ce3c43bfb59317083c6b55c9
-#: 7ee60ba1a92a4b9e8174049fb498a4f0 bca7c66f1a5a4517958bcea1e09d5d10
-#: f5cbe452ae694c7884ac4596a7735bf6
 #: internlm.initialize.initialize_trainer.initialize_trainer
 #: internlm.initialize.launch.get_default_parser
 #: internlm.train.training_internlm.get_train_data_loader
@ -57,55 +57,50 @@ msgstr ""
 msgid "返回"
 msgstr ""

-#: 9b04c3d6b98b44ee89f800b71e8d80a9
 #: internlm.initialize.launch.get_default_parser:4 of
 msgid ""
 "Returns the parser with the default arguments, the user may add "
 "customized arguments into this parser."
 msgstr ""

-#: 147005b197e64c4b9a96a7cfe78045bc 3634f79c9aa547a48eb3fd7f150deb51
-#: d3f0aa4143c84b719cd0b53170dd86c1
 #: internlm.initialize.initialize_trainer.initialize_trainer
 #: internlm.initialize.launch.get_default_parser
 #: internlm.train.training_internlm.initialize_model of
 msgid "返回类型"
 msgstr ""

-#: ../../source/initialize.rst:25 db2bf9d3ff81483dbf218e63dd4bbbe4
+#: ../../source/initialize.rst:25
 msgid "模型初始化"
 msgstr "Model Initialization"

-#: 5c2e33e254d4495fbc4b0226aac1fddb
 #: internlm.train.training_internlm.initialize_model:1 of
 msgid "Initialize model with Automatic Mixed Precision."
 msgstr ""

-#: c1254615508542b680daf73374844f9e
 #: internlm.train.training_internlm.initialize_model:3 of
 msgid "The neural network model to be trained or evaluated."
 msgstr ""

-#: ../../source/initialize.rst:29 b9867771b9da40cd8f3a55ee5ab95f65
+#: ../../source/initialize.rst:29
 msgid "InternLM 在配置文件中使用字段 ``model_type`` 和 ``model`` 来控制模型初始化过程。示例模型初始化配置定义如下："
 msgstr ""
 "InternLM uses the field ``model_type`` and ``model`` in the config file "
 "to control model initialization process. An example model initialization "
 "configuratio"

-#: ../../source/initialize.rst:57 984a38d7f63949ecbb0d8b2ef3459d57
+#: ../../source/initialize.rst:57
 msgid "字段 ``model_type`` 指明了要初始化的模型类型"
 msgstr ""
 "The field ``model_type`` specifics the model type has been registered and"
 " to be initialized."

-#: ../../source/initialize.rst:58 9f04ad0f145f4e40bc75a3ef45c7a59d
+#: ../../source/initialize.rst:58
 msgid "字段 ``model`` 中的参数指定了在模型初始化过程中的参数设置"
 msgstr ""
 "The parameters in field ``model`` specific the configuration settings "
 "during model initialization."

-#: ../../source/initialize.rst:60 d7780e355bb6429bb5151d9a0e6d7e36
+#: ../../source/initialize.rst:60
 msgid ""
 "值得注意的是，用户可以定义新的模型类型，并使用装饰器 ``@MODEL_INITIALIZER.register_module`` "
 "注册模型的初始化函数，其中 ``MODEL_INITIALIZER`` 是类 "
@ -117,109 +112,90 @@ msgstr ""
 " instantiated object of class ``internlm.util.registry.Registry``, the "
 "example is shown as follows."

-#: ../../source/initialize.rst:72 d863f71b208a49a09d2d00537e331962
+#: ../../source/initialize.rst:72
 msgid "优化器初始化"
 msgstr "Optimizer Initialization"

-#: acaafdc9bb96434bbd42a98f74187db1
 #: internlm.train.training_internlm.initialize_optimizer:1 of
 msgid "Initialize optimizer."
 msgstr ""

-#: 62fc4215c9a44bda8b31c933db90f270 93c398e44f6a4f708ba064250a3d253c
-#: e2bebdd751724915a65dec444bb89e25
 #: internlm.initialize.initialize_trainer.initialize_trainer
 #: internlm.train.training_internlm.get_train_data_loader
 #: internlm.train.training_internlm.initialize_optimizer of
 msgid "参数"
 msgstr ""

-#: 2033ee96ded8423a80268b337ba9549c
 #: internlm.train.training_internlm.initialize_optimizer:3 of
 msgid "Your model instance to be trained or evaluated."
 msgstr ""

-#: df01b44c724b4326a6c85b44694262ba
 #: internlm.train.training_internlm.initialize_optimizer:6 of
 msgid "A tuple of (optimizer, beta2_scheduler, lr_scheduler)."
 msgstr ""

-#: ../../source/initialize.rst:79 0b46b890048f4758a9d56e0540759d9f
+#: ../../source/initialize.rst:79
 msgid "数据加载器初始化"
 msgstr "Dataloader Initialization"

-#: 58e39b26ab4849788e792df386f01d7e
 #: internlm.train.training_internlm.get_train_data_loader:1 of
 msgid "Generate and return the training data loader."
 msgstr ""

-#: 37a91c167e0b4e5fad4edcc3caf0d012
 #: internlm.train.training_internlm.get_train_data_loader:3 of
 msgid "number of subprocesses used for dataloader."
 msgstr ""

-#: 947aba2a4f86420d9b2660425a6043cc
 #: internlm.train.training_internlm.get_train_data_loader:5 of
 msgid "generate function for dataset."
 msgstr ""

-#: 8a8f5ee665cb4e15bc33194c0b1f346c
 #: internlm.train.training_internlm.get_train_data_loader:7 of
 msgid "dataset sampler for training dataloader."
 msgstr ""

-#: 4c3e1e896e7940bf97c124909d2e7f36
 #: internlm.train.training_internlm.get_train_data_loader:9 of
 msgid "collate function for training dataloader."
 msgstr ""

-#: d9f0740d048c48888e82c8f8a78e33cd
 #: internlm.train.training_internlm.get_train_data_loader:12 of
 msgid "A tuple of (train_dl, dataset_types)."
 msgstr ""

-#: ../../source/initialize.rst:86 1c4df708ff5c47f6abae32617bf2ed31
+#: ../../source/initialize.rst:86
 msgid "Trainer 初始化"
 msgstr "Trainer Initialization"

-#: d535583dbcb245499e19c09f3f8b534a
 #: internlm.initialize.initialize_trainer.initialize_trainer:1 of
 msgid ""
 "Core function to wrap the essential training components with our "
 "functionality based on the config which is loaded into gpc.config."
 msgstr ""

-#: 3e370234e4b245e4b9cae1fe235df8ff
 #: internlm.initialize.initialize_trainer.initialize_trainer:4 of
 msgid "Your model instance or a function to build the model."
 msgstr ""

-#: b716a4a264234011a7b51fa12e575651
 #: internlm.initialize.initialize_trainer.initialize_trainer:6 of
 msgid "Your optimizer for training."
 msgstr ""

-#: 6a54ce9d516f4f14bab281c9db9816e8
 #: internlm.initialize.initialize_trainer.initialize_trainer:8 of
 msgid "Your criterion instance."
 msgstr ""

-#: ff9dfd04d31b4dc6afbdd841829b4c33
 #: internlm.initialize.initialize_trainer.initialize_trainer:10 of
 msgid "Dataloader for training."
 msgstr ""

-#: de345f9a457a4a88bf60b4ee96535e31
 #: internlm.initialize.initialize_trainer.initialize_trainer:12 of
 msgid "Dataloader for testing."
 msgstr ""

-#: 64e646b25420424d9dcdfb1ad7de5e6f
 #: internlm.initialize.initialize_trainer.initialize_trainer:14 of
 msgid "Your lr scheduler instance, optional."
 msgstr ""

-#: 39c7132bfafe4e22ae373081fee711ce
 #: internlm.initialize.initialize_trainer.initialize_trainer:17 of
 msgid ""
 "A tuple of ``(trainer, train_dataloader, test_dataloader, lr_scheduler)``"
--- a/doc/code-docs/locales/en/LC_MESSAGES/profiler.po
+++ b/doc/code-docs/locales/en/LC_MESSAGES/profiler.po
@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: InternLM \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2023-09-07 10:56+0800\n"
+"POT-Creation-Date: 2023-09-08 15:32+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: en\n"
@ -19,122 +19,147 @@ msgstr ""
 "Content-Transfer-Encoding: 8bit\n"
 "Generated-By: Babel 2.12.1\n"

-#: ../../source/profiler.rst:2 81b1b5f4414449dfaf107815a911f300
+#: ../../source/profiler.rst:2
 msgid "性能分析"
 msgstr "Profiler"

-#: ../../source/profiler.rst:7 d709646ebb314e9abb6a4839a21180bd
+#: ../../source/profiler.rst:7
 msgid "Torch Profiler"
 msgstr ""

-#: ../../source/profiler.rst:9 4b5b73486c794c7a9168ad19999e12e1
+#: ../../source/profiler.rst:9
 msgid ""
 "InternLM 使用 ``internlm.train.initialize_llm_profile()`` "
 "来收集和分析模型训练或推理期间的性能数据，如 CPU/CUDA/memory 等性能数据。这个实现基于 `torch.profiler "
 "<https://pytorch.org/docs/stable/profiler.html>`_ ，输出的性能分析 trace 文件可以使用 "
 "`tensorboard <https://www.tensorflow.org>`_ 进行可视化。"
 msgstr ""
-"InternLM uses ``internlm.train.initialize_llm_profile()`` to profile performance data, execution time duration and breakdown analysis of "
-"step time. The implementation is based on `torch.profiler <https://pytorch.org/docs/stable/profiler.html>`_ and output tracing files can "
-"be visualized with `tensorboard <https://www.tensorflow.org>`_."
+"InternLM uses ``internlm.train.initialize_llm_profile()`` to profile "
+"performance data, execution time duration and breakdown analysis of step "
+"time. The implementation is based on `torch.profiler "
+"<https://pytorch.org/docs/stable/profiler.html>`_ and output tracing "
+"files can be visualized with `tensorboard <https://www.tensorflow.org>`_."

-#: ../../source/profiler.rst:11 40ff4289735c43fdbeca871b65e82be4
+#: ../../source/profiler.rst:11
 msgid ""
 "用户如果想使用这个 torch 性能分析工具，需要在启动训练时传递 ``--profiling`` 参数以启用性能分析。完成 torch "
 "性能分析后，用户可以在 ``{JOB_NAME}/{start_time}/traces/rank{}_dp{}_tp{}_pp{}`` "
 "文件夹中看到性能分析结果。"
 msgstr ""
-"To use this torch profiler tool, you need to enable profiling by passing the ``--profiling`` flag when starting training. After torch "
-"profiling is completed, you can find the profiling results in the ``{JOB_NAME}/{start_time}/traces/rank{}_dp{}_tp{}_pp{}`` folder."
+"To use this torch profiler tool, you need to enable profiling by passing "
+"the ``--profiling`` flag when starting training. After torch profiling is"
+" completed, you can find the profiling results in the "
+"``{JOB_NAME}/{start_time}/traces/rank{}_dp{}_tp{}_pp{}`` folder."
+
+#: ../../source/profiler.rst:13
+msgid "实际运行生成的 ``Torch Profiler`` 目录结构如下："
+msgstr "The directory structure of ``Torch Profiler`` generated files is as follows:"
+
+#: ../../source/profiler.rst:22
+msgid "其中， ``traces`` 可以通过 ``TensorBoard`` 可视化，运行命令"
+msgstr "Among them, ``traces`` can be visualized through ``TensorBoard`` and run with the command"
+
+#: ../../source/profiler.rst:29
+msgid ""
+"在打开的 ``TensorBoard -> PyTorch Profiler -> Views -> Trace`` "
+"页面可以看到Operator和GPU Kernel的性能分析时间线如下，更多的功能请参考 `torch profiler with "
+"tensorboard "
+"<https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html"
+"#pytorch-profiler-with-tensorboard>`_"
+msgstr "In the opened ``TensorBoard -> PyTorch Profiler -> Views -> Trace`` page, you can see the timeline of profiled operators and GPU kernels. For more usage, please refer to `torch profiler with tensorboard <https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#pytorch-profiler-with-tensorboard>`_"

-#: 876a2993b82645f7b56553fe64b514ec
 #: internlm.train.training_internlm.initialize_llm_profile:1 of
 msgid "Initialize and return the profiler context manager instance."
 msgstr ""

-#: ../../source/profiler.rst:16 3ab9536155ea4f3b8adb318005970bb8
+#: ../../source/profiler.rst:38
 msgid "Memory Profiler"
 msgstr ""

-#: ../../source/profiler.rst:18 0ec4091fef5b47c58488618bfb4dcd3b
+#: ../../source/profiler.rst:40
 msgid ""
 "InternLM 提供了一个实用的内存分析工具 "
 "``internlm.utils.simple_memory_profiler.SimpleMemoryProfiler`` 来监控实际的 GPU"
 " 内存使用情况。在实现中，会对模型数据（包括模型参数、模型梯度和优化器状态）和非模型数据（包括激活值）分别进行详细的统计。"
 msgstr ""
-"InternLM provides a practical solution ``internlm.utils.simple_memory_profiler.SimpleMemoryProfiler`` to monitor actual GPU memory usage. "
-"In the implmentation, model data (including model parameters, model gradients, and optimizer states) and non-model data "
-"(including activations) are calculated."
+"InternLM provides a practical solution "
+"``internlm.utils.simple_memory_profiler.SimpleMemoryProfiler`` to monitor"
+" actual GPU memory usage. In the implmentation, model data (including "
+"model parameters, model gradients, and optimizer states) and non-model "
+"data (including activations) are calculated."

-#: ../../source/profiler.rst:20 cd62bbd5b122480da21e10453b95090c
+#: ../../source/profiler.rst:42
 msgid ""
 "要使用这个内存分析工具，用户需要在启动训练时传递 ``--profiling`` 参数以启用内存分析。完成内存分析后，用户可以在 "
 "``memory_trace/rank{}_dp{}_tp{}`` 文件夹中找到特定 rank "
 "对应的内存分析结果（包括不同时间点的内存使用日志和显示总体内存使用情况的太阳图表）。"
 msgstr ""
-"To use this memory profiler tool, you need to enable profiling by passing the ``--profiling`` flag when starting training. After memory "
-"profiling is completed, you can find the profiling results (including logs of memory usage at different time point and sunburst charts "
-"showing overall memory usage) for a specific rank device in the ``memory_trace/rank{}_dp{}_tp{}`` folder."
+"To use this memory profiler tool, you need to enable profiling by passing"
+" the ``--profiling`` flag when starting training. After memory profiling "
+"is completed, you can find the profiling results (including logs of "
+"memory usage at different time point and sunburst charts showing overall "
+"memory usage) for a specific rank device in the "
+"``memory_trace/rank{}_dp{}_tp{}`` folder."
+
+#: ../../source/profiler.rst:44
+msgid "实际运行生成的 ``memory_trace`` 目录结构如下："
+msgstr "The directory structure of ``memory_trace`` generated files is as follows:"
+
+#: ../../source/profiler.rst:107
+msgid "其中， ``memory.log`` 的内容示例如下："
+msgstr "An example of ``memory.log`` is as follows:"
+
+#: ../../source/profiler.rst:157
+msgid "模型参数的太阳图示例如下："
+msgstr "An example of model parameters sunburst chart is as follows:"

-#: a858f1377b714cd5ab0cf749d8dbfeb7
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler:1 of
 msgid "A memory profiler for a llm model."
 msgstr ""

-#: 08d4cca2ba154080ba72e7d3fbd2a344 36e25696cf7b4a8ca5472e86fd5eea7e
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.point of
 msgid "参数"
 msgstr ""

-#: dea424767bc44ff689d582c67b07d637
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler:3 of
 msgid "The model to profile."
 msgstr ""

-#: 4f3892910fa14324810c3f33c6af4fdd
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler:5 of
 msgid "The optimizer used for training the model."
 msgstr ""

-#: a698f2f57eef4e47a22faa546c687979
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler:7 of
 msgid "The file to write the memory state information to."
 msgstr ""

-#: 448fc2b81c794d228ec4b413356289ea
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler:9 of
 msgid "number of steps to trace."
 msgstr ""

-#: 85b3b9d4147547fd89c286f003395469
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.point:1 of
 msgid "Record the memory state."
 msgstr ""

-#: d474a46415674d35a2c87c57ebff20ea
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.point:3 of
 msgid "The options to include in the memory state. Defaults to \"\"."
 msgstr ""

-#: 16261fe5b1df4b13bd23f76d97caf1be
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.point:5 of
 msgid "Whether to create a new memory record file. Defaults to False."
 msgstr ""

-#: 3b18845958204f07a6b80b6afb2221f5 d11f76d03d0d456889dee6d267dd4b74
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.point
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.step of
 msgid "返回"
 msgstr ""

-#: 0deeb9555efb4aa798fd9d146826e961 46b50da453f1475a88e096b5d6ed8afb
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.point:8
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.step:3 of
 msgid "None"
 msgstr ""

-#: 4f2331ac352d4057a852b013ca688ed3
 #: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler.step:1 of
 msgid "Update the memory state of the optimizer state."
 msgstr ""
--- a/doc/code-docs/source/profiler.rst
+++ b/doc/code-docs/source/profiler.rst
@ -10,6 +10,28 @@ InternLM 使用 ``internlm.train.initialize_llm_profile()`` 来收集和分析

 用户如果想使用这个 torch 性能分析工具，需要在启动训练时传递 ``--profiling`` 参数以启用性能分析。完成 torch 性能分析后，用户可以在 ``{JOB_NAME}/{start_time}/traces/rank{}_dp{}_tp{}_pp{}`` 文件夹中看到性能分析结果。

+实际运行生成的 ``Torch Profiler`` 目录结构如下：
+
+.. code-block:: bash
+
+    # tree ./7b_train/Sep08_11-00-51/traces -L 2
+    ./7b_train/Sep08_11-00-51/traces/
+    └── rank0_dp0_tp0_pp0
+        └── SH-IDC1-10-140-1-78_238619.1694142354680.pt.trace.json
+
+其中， ``traces`` 可以通过 ``TensorBoard`` 可视化，运行命令
+
+.. code-block:: bash
+
+    # visualize traces with tensorboard and custom port
+    tensorboard --logdir rank0_dp0_tp0_pp0 --port 10088
+
+在打开的 ``TensorBoard -> PyTorch Profiler -> Views -> Trace`` 页面可以看到Operator和GPU Kernel的性能分析时间线如下，更多的功能请参考 `torch profiler with tensorboard <https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#pytorch-profiler-with-tensorboard>`_
+
+.. figure:: ../../imgs/torch_profiler_trace.png
+  :scale: 45%
+  :class: with-border
+
 .. autofunction:: internlm.train.initialize_llm_profile

 Memory Profiler
@ -19,5 +41,124 @@ InternLM 提供了一个实用的内存分析工具 ``internlm.utils.simple_memo

 要使用这个内存分析工具，用户需要在启动训练时传递 ``--profiling`` 参数以启用内存分析。完成内存分析后，用户可以在 ``memory_trace/rank{}_dp{}_tp{}`` 文件夹中找到特定 rank 对应的内存分析结果（包括不同时间点的内存使用日志和显示总体内存使用情况的太阳图表）。

+实际运行生成的 ``memory_trace`` 目录结构如下：
+
+.. code-block:: bash
+
+    # tree ./memory_trace -L 2
+    ./memory_trace
+    ├── rank0_dp0_tp0                              # Profiling results for a specific rank device
+    │   ├── activation_memory_sunburst.html        # Sunburst chart showing activation memory usage
+    │   ├── grads_memory_sunburst.html             # Sunburst chart showing gradient memory usage
+    │   ├── memory.log                             # Log of GPU memory usage at different time points
+    │   ├── os_memory_sunburst.html                # Sunburst chart showing optimizer state memory usage
+    │   ├── params_memory_sunburst.html            # Sunburst chart showing parameter memory usage
+    │   └── summary_sunburst.html                  # Sunburst chart showing overall memory usage
+    ├── rank1_dp1_tp0
+    │   ├── activation_memory_sunburst.html
+    │   ├── grads_memory_sunburst.html
+    │   ├── memory.log
+    │   ├── os_memory_sunburst.html
+    │   ├── params_memory_sunburst.html
+    │   └── summary_sunburst.html
+    ├── rank2_dp2_tp0
+    │   ├── activation_memory_sunburst.html
+    │   ├── grads_memory_sunburst.html
+    │   ├── memory.log
+    │   ├── os_memory_sunburst.html
+    │   ├── params_memory_sunburst.html
+    │   └── summary_sunburst.html
+    ├── rank3_dp3_tp0
+    │   ├── activation_memory_sunburst.html
+    │   ├── grads_memory_sunburst.html
+    │   ├── memory.log
+    │   ├── os_memory_sunburst.html
+    │   ├── params_memory_sunburst.html
+    │   └── summary_sunburst.html
+    ├── rank4_dp4_tp0
+    │   ├── activation_memory_sunburst.html
+    │   ├── grads_memory_sunburst.html
+    │   ├── memory.log
+    │   ├── os_memory_sunburst.html
+    │   ├── params_memory_sunburst.html
+    │   └── summary_sunburst.html
+    ├── rank5_dp5_tp0
+    │   ├── activation_memory_sunburst.html
+    │   ├── grads_memory_sunburst.html
+    │   ├── memory.log
+    │   ├── os_memory_sunburst.html
+    │   ├── params_memory_sunburst.html
+    │   └── summary_sunburst.html
+    ├── rank6_dp6_tp0
+    │   ├── activation_memory_sunburst.html
+    │   ├── grads_memory_sunburst.html
+    │   ├── memory.log
+    │   ├── os_memory_sunburst.html
+    │   ├── params_memory_sunburst.html
+    │   └── summary_sunburst.html
+    └── rank7_dp7_tp0
+        ├── activation_memory_sunburst.html
+        ├── grads_memory_sunburst.html
+        ├── memory.log
+        ├── os_memory_sunburst.html
+        ├── params_memory_sunburst.html
+        └── summary_sunburst.html
+
+其中， ``memory.log`` 的内容示例如下：
+
+.. code-block:: bash
+
+    Memory State:
+    time: 37.56313228607178
+    ---summary---
+    total_memory: 55953.56 MB
+    params_memory: 13965.51 MB, grads_memory: 13965.51 MB, os_params_memory: 3461.52 MB, os_state_memory: 6923.03 MB, activation_memory: 17638.00 MB
+
+    Memory State:
+    time: 38.46969723701477
+    ---summary---
+    total_memory: 38315.56 MB
+    params_memory: 13965.51 MB, grads_memory: 13965.51 MB, os_params_memory: 3461.52 MB, os_state_memory: 6923.03 MB, activation_memory: 0.00 MB
+    ---Layout---
+    params_layout:
+    layer: param_mem, layer_mem: 0.00 MB, total_mem: 13965.51 MB
+    layer: param_mem.embedding, layer_mem: 0.00 MB, total_mem: 806.00 MB
+    layer: param_mem.embedding.weight, layer_mem: 806.00 MB, total_mem: 806.00 MB
+    layer: param_mem.blocks, layer_mem: 0.00 MB, total_mem: 12353.50 MB
+    layer: param_mem.blocks.0, layer_mem: 0.00 MB, total_mem: 386.05 MB
+    layer: param_mem.blocks.0.mixer, layer_mem: 0.00 MB, total_mem: 128.03 MB
+    layer: param_mem.blocks.0.mixer.Wqkv, layer_mem: 0.00 MB, total_mem: 96.02 MB
+    layer: param_mem.blocks.0.mixer.Wqkv.weight, layer_mem: 96.00 MB, total_mem: 96.00 MB
+    layer: param_mem.blocks.0.mixer.Wqkv.bias, layer_mem: 0.02 MB, total_mem: 0.02 MB
+    layer: param_mem.blocks.0.mixer.out_proj, layer_mem: 0.00 MB, total_mem: 32.01 MB
+    layer: param_mem.blocks.0.mixer.out_proj.weight, layer_mem: 32.00 MB, total_mem: 32.00 MB
+    layer: param_mem.blocks.0.mixer.out_proj.bias, layer_mem: 0.01 MB, total_mem: 0.01 MB
+    layer: param_mem.blocks.0.norm1, layer_mem: 0.00 MB, total_mem: 0.01 MB
+    layer: param_mem.blocks.0.norm1.weight, layer_mem: 0.01 MB, total_mem: 0.01 MB
+    layer: param_mem.blocks.0.norm2, layer_mem: 0.00 MB, total_mem: 0.01 MB
+    layer: param_mem.blocks.0.norm2.weight, layer_mem: 0.01 MB, total_mem: 0.01 MB
+    layer: param_mem.blocks.0.mlp, layer_mem: 0.00 MB, total_mem: 258.00 MB
+    layer: param_mem.blocks.0.mlp.w1, layer_mem: 0.00 MB, total_mem: 86.00 MB
+    layer: param_mem.blocks.0.mlp.w1.weight, layer_mem: 86.00 MB, total_mem: 86.00 MB
+    layer: param_mem.blocks.0.mlp.w2, layer_mem: 0.00 MB, total_mem: 86.00 MB
+    layer: param_mem.blocks.0.mlp.w2.weight, layer_mem: 86.00 MB, total_mem: 86.00 MB
+    layer: param_mem.blocks.0.mlp.w3, layer_mem: 0.00 MB, total_mem: 86.00 MB
+    layer: param_mem.blocks.0.mlp.w3.weight, layer_mem: 86.00 MB, total_mem: 86.00 MB
+    ......
+    grads_layout:
+    ......
+    os_params_layout:
+    ......
+    os_state_layout:
+    ......
+    activation_base_layout:
+    ......
+
+模型参数的太阳图示例如下：
+
+.. figure:: ../../imgs/params_memory_sunburst.png
+  :scale: 50%
+  :class: with-border
+
 .. autoclass:: internlm.utils.simple_memory_profiler.SimpleMemoryProfiler
    :members:
--- a/doc/imgs/params_memory_sunburst.png
+++ b/doc/imgs/params_memory_sunburst.png
--- a/doc/imgs/torch_profiler_trace.png
+++ b/doc/imgs/torch_profiler_trace.png
--- a/version.txt
+++ b/version.txt
@ -1 +1 @@
-0.1.0
+0.2.0
 @ -1 +1 @@
 .1.0
 .2.0