Data Version Control¶
DVC runs alongside git and uses the current commit hash to version control the data.
Initialize the dvc repository:
To start tracking a file or directory, use dvc add:
DVC stores information about the added file (or a directory) in a special .dvc file named data/ImageNet.dvc, a small text file with a human-readable format.
This file can be easily versioned like source code with Git, as a placeholder for the original data (which gets listed in .gitignore):
Making changes¶
When you make a change to a file or directory, run dvc add again to track the latest version:
Switching between versions¶
The regular workflow is to use git checkout first to switch a branch, checkout a commit, or a revision of a .dvc file, and then run dvc checkout to sync data:
Info
Read more in the DVC docs!