š DataChain Open-Source Release. Star us on !
GTO is a tool for creating an Artifact Registry in your Git repository. One of the special cases we would like to highlight is creating a Machine Learning Model Registry.
Such a registry serves as a centralized place to store and operationalize your artifacts along with their metadata; manage model life-cycle, versions & releases, and easily automate tests and deployments using GitOps.
Usually, Artifact Registry usage follows these three steps:
GTO helps you achieve all of them in a GitOps way. If you would like to see an example, please follow Get Started.
In Software Engineering, Git is a heart of the Software system. The code is committed to Git and CI/CD triggers on new commits making the downstream action necessary. Such approaches as GitOps made huge steps towards automation of development cycles, reducing errors and helping maintain productive software development.
Artifact Registries (and Model Registries in specific) usually introduce a separate service or infrastructure, as well as new set of APIs to integrate with. This often leads to a necessity to maintain two different systems, which is a significant overhead. For example, if you work in Machine Learning, you often need two teams (Data Science specialists and Software Engineers) each responsible of maintaining their part of the system.
GTO builds that on top of Git repository using Git tags to register versions and
assign stages, and using artifacts.yaml
file to keep the metainformation about
artifacts, such as path
, type
, description
and etc. If your artifact
development is built around Git, you won't need to introduce new things for your
team to manage.
One example (although specific to Model Registry) is really good at demonstrating this problem of handling two worlds at the same time. When you train your Machine Learning models, you have to know what code and data was used to do it. If Model Registry lives in a separate system, you (or the code you've written) have to record the code and data snapshots (or just a Git commit hexsha). Now if you forgot to record the hexsha when you registered a new model version in Model Registry, or used an incorrect hexsha, no one can reproduce your training process. Keeping track of both models and their versions in Git solves that problem.
There are few limitations to the GTO approach to building an Artifact Registry:
If you hit the last two limitations, you may find Studio useful.