MLEM now offers deployment to Kubernetes and Sagemaker with a single command.
Large files are typically not stored in a Git repository, so they need to be downloaded from external sources. DVC is a great way to store your GTO artifact files while keeping a pointer in the repo, and simplifying data management and synchronization.
If you're new to DVC, get started here first.
First, we need to start tracking artifact with DVC. If you produce this artifact in DVC Pipelines, it's done automatically.
If the artifact is located inside your Git repo, you can use
$ dvc add model.pkl $ git add model.pkl.dvc
If the artifact is located in some external storage, we can use
to still keep metainformation about it in the repo (use
--no-download to skip
$ dvc import-url --no-download s3://container/model.pkl $ git add model.pkl.dvc
Once the artifact is tracked with DVC within your repo, we can annotate it with GTO:
$ gto annotate model --path model.pkl
This will write the following to
model: path: model.pkl
Commit the changes to Git in order to
gto register artifact versions and
gto assign them to deployment stages referencing the new commit.
$ git add artifacts.yaml $ git commit -m "version artifact binaries with DVC and annotate it with GTO"
To share your work, you'll need remote storage setup in DVC. You can then upload the artifact files and the changes to the repo:
$ dvc push $ git push
To download GTO artifact files tracked with DVC, you can use the
dvc get or
dvc import commands (or simply use
dvc pull if you
cd inside the repo).
$ dvc get $REPO $ARTIFACT_PATH --rev $REVISION -o $OUTPUT_PATH
Check out User Guide to
learn how to find out