Skip to content

Restore

The template offers a way to restore a previous run from the configuration. The relevant configuration block is in conf/train/default.yml:

restore:
  ckpt_or_run_path: null
  mode: null # null, finetune, hotstart, continue

ckpt_or_run_path

The ckpt_or_run_path can be a path towards a Lightning Checkpoint or the run identifiers w.r.t. the logger. In case of W&B as a logger, they are called run_path and are in the form of entity/project/run_id.

Warning

If ckpt_or_run_path points to a checkpoint, that checkpoint must have been saved with this template, because additional information are attached to the checkpoint to guarantee a correct restore. These include the run_path itself and the whole configuration used.

mode

We support 4 different modes for restoring an experiment:

restore:
    mode: null
In this mode no restore happens, and ckpt_or_run_path is ignored.

Use Case

This is the default option and allows the user to train the model from scratch logging into a new run.

restore:
    mode: finetune
In this mode only the model weights are restored, both the Trainer state and the logger run are not restored.

Use Case

As the name suggest, one of the most common use case is when fine tuning a trained model logging into a new run with a novel training regimen.

restore:
    mode: hotstart
In this mode the training continues from the checkpoint restoring the Trainer state but the logging does not. A new run is created on the logger dashboard.

Use Case

Perform different tests in separate logging runs branching from the same trained model.

restore:
    mode: continue
In this mode the training continues from the checkpoint and the logging continues in the previous run. No new run is created on the logger dashboard.

Use Case

The training execution was interrupted and the user wants to continue it.

Restore summary

null finetune hotstart continue
Model weights ❌ ✅ ✅ ✅
Trainer state ❌ ❌ ✅ ✅
Logging run ❌ ❌ ❌ ✅