ColossalAI/docs/source/en/basics/booster_checkpoint.md

2.0 KiB

Booster Checkpoint

Author: Hongxin Liu

Prerequisite:

Introduction

We've introduced the Booster API in the previous tutorial. In this tutorial, we will introduce how to save and load checkpoints using booster.

Model Checkpoint

{{ autodoc:colossalai.booster.Booster.save_model }}

Model must be boosted by colossalai.booster.Booster before saving. checkpoint is the path to saved checkpoint. It can be a file, if shard=False. Otherwise, it should be a directory. If shard=True, the checkpoint will be saved in a sharded way. This is useful when the checkpoint is too large to be saved in a single file. Our sharded checkpoint format is compatible with huggingface/transformers, so you can use huggingface from_pretrained method to load model from our sharded checkpoint.

{{ autodoc:colossalai.booster.Booster.load_model }}

Model must be boosted by colossalai.booster.Booster before loading. It will detect the checkpoint format automatically, and load in corresponding way.

Optimizer Checkpoint

{{ autodoc:colossalai.booster.Booster.save_optimizer }}

Optimizer must be boosted by colossalai.booster.Booster before saving.

{{ autodoc:colossalai.booster.Booster.load_optimizer }}

Optimizer must be boosted by colossalai.booster.Booster before loading.

LR Scheduler Checkpoint

{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }}

LR scheduler must be boosted by colossalai.booster.Booster before saving. checkpoint is the local path to checkpoint file.

{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}

LR scheduler must be boosted by colossalai.booster.Booster before loading. checkpoint is the local path to checkpoint file.

Checkpoint design

More details about checkpoint design can be found in our discussion A Unified Checkpoint System Design.