MLEM now offers deployment to Kubernetes and Sagemaker with a single command.
Large files are typically not stored in a Git repository, so they need to be downloaded from external sources. DVC is a great way to store your GTO artifact files while keeping a pointer in the repo, and simplifying data management and synchronization.
If you're new to DVC, get started here first.
First, we need to start tracking artifact with DVC. If you produce this artifact in DVC Pipelines, it's done automatically.
If the artifact is located inside your Git repo, you can use dvc add
:
$ dvc add model.pkl
$ git add model.pkl.dvc
If the artifact is located in some external storage, we can use dvc import-url
to still keep metainformation about it in the repo (use --no-download
to skip
downloading it):
$ dvc import-url --no-download s3://container/model.pkl
$ git add model.pkl.dvc
Once the artifact is tracked with DVC within your repo, we can annotate it with GTO:
$ gto annotate model --path model.pkl
This will write the following to artifacts.yaml
:
model:
path: model.pkl
Commit the changes to Git in order to gto register
artifact versions and
gto assign
them to deployment stages referencing the new commit.
$ git add artifacts.yaml
$ git commit -m "version artifact binaries with DVC and annotate it with GTO"
To share your work, you'll need remote storage setup in DVC. You can then upload the artifact files and the changes to the repo:
$ dvc push
$ git push
To download GTO artifact files tracked with DVC, you can use the dvc get
or
dvc import
commands (or simply use dvc pull
if you cd
inside the repo).
$ dvc get $REPO $ARTIFACT_PATH --rev $REVISION -o $OUTPUT_PATH
Check out User Guide to
learn how to find out ARTIFACT_PATH
and REVISION
.
If you need to download the latest version of model
, that would be:
$ ARTIFACT_PATH=$(gto describe --repo $REPO [email protected] --path)
$ REVISION=$(gto show --repo $REPO [email protected] --ref)
$ dvc get $REPO $ARTIFACT_PATH --rev $REVISION -o $ARTIFACT_PATH