2021-11-02 15:01:13 +00:00
# Zero Redundancy optimizer and zero offload
2021-10-28 16:21:23 +00:00
2021-11-02 15:01:13 +00:00
The Zero Redundancy Optimizer (ZeRO) removes the memory redundancies across data-parallel processes by partitioning three
2021-12-10 06:37:33 +00:00
model states (optimizer states, gradients, and parameters) instead of replicating them.
2021-11-02 15:01:13 +00:00
By doing so, memory efficiency is boosted drastically compared to classic data parallelism while the computational granularity
and communication efficiency are retained.
2021-10-28 16:21:23 +00:00
2021-11-02 15:01:13 +00:00
1. **ZeRO Level 1** : The optimizer states (e.g., for [Adam optimizer ](https://arxiv.org/abs/1412.6980 ), 32-bit weights, and the
first and second momentum estimates) are partitioned across the processes, so that each process updates only its partition.
2. **ZeRO Level 2** : The reduced 32-bit gradients for updating the model weights are also partitioned such that each process
only stores the gradients corresponding to its partition of the optimizer states.
3. **ZeRO Level 3** : The 16-bit model parameters are partitioned across the processes. ZeRO-3 will automatically collect and
partition them during the forward and backward passes.
2021-10-28 16:21:23 +00:00
2021-11-02 15:01:13 +00:00
## Getting Started with ZeRO
2021-10-28 16:21:23 +00:00
2021-12-10 06:37:33 +00:00
If you are training models with Colossal-AI, enabling ZeRO DP and Offloading is easy by addding several lines in your configuration file. We support configration for level 2 and 3. You have use [PyTorch native implementation ](https://pytorch.org/tutorials/recipes/zero_redundancy_optimizer.html ) for level 1 optimizer.
2021-11-02 15:01:13 +00:00
Below are a few examples of ZeRO-3 configurations.
2021-10-28 16:21:23 +00:00
2021-11-02 15:01:13 +00:00
### Example of ZeRO-3 Configurations
2021-10-28 16:21:23 +00:00
2021-12-13 14:07:01 +00:00
You can refer to the [DeepSpeed configuration ](https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training ) for details.
2021-11-02 15:01:13 +00:00
Here we use `Adam` as the initial optimizer.
2021-10-28 16:21:23 +00:00
2021-12-10 06:37:33 +00:00
1. Use ZeRO to partition the optimizer states, gradients (level 2), and parameters (level 3).
2021-10-28 16:21:23 +00:00
```python
zero = dict(
2021-12-10 06:37:33 +00:00
level=3,
2021-10-28 16:21:23 +00:00
dynamic_loss_scale=True,
clip_grad=1.0
)
```
2021-12-10 06:37:33 +00:00
2021-10-28 16:21:23 +00:00
2. Additionally offload the optimizer states and computations to the CPU.
```python
zero = dict(
2021-12-10 06:37:33 +00:00
level=3,
2021-10-28 16:21:23 +00:00
offload_optimizer_config=dict(
device='cpu',
pin_memory=True,
fast_init=True
),
...
)
```
3. Save even more memory by offloading parameters to the CPU memory.
```python
zero = dict(
2021-12-10 06:37:33 +00:00
level=3,
2021-10-28 16:21:23 +00:00
offload_optimizer_config=dict(
device='cpu',
pin_memory=True,
fast_init=True
),
offload_param_config=dict(
device='cpu',
pin_memory=True,
fast_init=OFFLOAD_PARAM_MAX_IN_CPU
),
...
)
```
4. Save even MORE memory by offloading to NVMe (if available on your system):
```python
zero = dict(
2021-12-10 06:37:33 +00:00
level=3,
2021-10-28 16:21:23 +00:00
offload_optimizer_config=dict(
device='nvme',
pin_memory=True,
fast_init=True,
nvme_path='/nvme_data'
),
offload_param_config=dict(
device='nvme',
pin_memory=True,
max_in_cpu=OFFLOAD_PARAM_MAX_IN_CPU,
nvme_path='/nvme_data'
),
...
)
```
2021-12-10 06:37:33 +00:00
Note that `fp16` is automatically enabled when using ZeRO. This relies on `AMP_TYPE.NAIVE` in Colossal-AI AMP module.
2021-10-28 16:21:23 +00:00
### Training
2021-12-10 09:48:50 +00:00
Note that if your model is too large to fit within the memory when using ZeRO-3, you should use `colossalai.zero.zero3_model_context` to construct your model:
```python
from colossalai.zero import zero3_model_context
with zero3_model_context():
model = Model()
```
2021-11-02 15:01:13 +00:00
Once you have completed your configuration, just use `colossalai.initialize()` to initialize your training.