User Guide

GTO lets you build an Artifact Registry or Model Registry out of your Git repository by creating annotated Git tags with a special format. To read more about building a Model Registry, read this Studio User Guide.

Finding the right artifact version

You may need to get a specific artifact version to a certain environment, most likely the latest one or the one currently assigned to the stage. Use gto show to find the Git reference (tag) you need.

Get the git tag for the latest version:

$ gto show churn@latest --ref
[email protected]

Get the git tag for the version in prod stage:

$ gto show churn#prod --ref
[email protected]

GTO doesn't provide a way to deliver the artifacts, but you can use DVC or any method to retrieve files from the repo. With DVC, you can use dvc get:

$ dvc get $REPO $ARTIFACT_PATH --rev $REVISION -o $OUTPUT_PATH

You can also use DVC with GTO to:

Store large artifacts (models and data) and track pointers to them in your repo.
Keep artifact metadata like the path or type (model or dataset). To see an example, check out the example-gto repo.

Acting on new registrations and assignments

A popular option to act on Git tags pushed in your repo is to set up CI/CD. To see an example, check out the workflow in example-gto repo. The workflow uses the GTO GH Action that fetches all Git tags (to correctly interpret the Registry), finds out the version of the artifact that was registered, the stage that was assigned, and annotations details such as path, type, description, etc, so you could use them in the next steps of the CI. Note that it finds these annotation details by reading dvc.yaml managed by DVC.

If you're working with GitLab or BitBucket, feel free to create an issue asking for a similar action, or submit yours for us to add to documentation.

Other approaches: webhooks and polling Git forge API

Besides using CI/CD, the other option is to configure webhooks that will send HTTP requests to your server upon pushing Git tags to the remote.

Besides, you can configure your server to query your Git provider via something like REST API to check if changes happened. As an example, check out Github REST API.

CI/CD workflow examples

We use MLEM in these examples, but you can use any other tool to build, publish or deploy your models, or do any other action with your artifacts.

GitHub: build a Docker image

This workflow will build a docker image out of the model and push it to a DockerHub.

# .github/workflows/build.yaml
on:
  push:
    tags:
      - '*'

jobs:
  act:
    name: Build a Docker image for new model versions
    runs-on: ubuntu-latest
    steps:
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        # set credentials to login to DockerHub
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - uses: actions/checkout@v3
      - id: gto
        uses: iterative/gto-action@v2
      - uses: actions/setup-python@v2
      - name: Install dependencies
        run: |
          pip install --upgrade pip setuptools wheel
          pip install -r requirements.txt
      - if: steps.gto.outputs.event == 'registration'
        run: |
          mlem build docker \
              --model '${{ steps.gto.outputs.path }}' \
              --image.name ${{ steps.gto.outputs.name }} \
              --image.tag '${{ steps.gto.outputs.version }}' \
              --image.registry docker_io

Learn more about building Docker images, Python packages or preparing docker build-ready folders from your models with MLEM.

GitHub: deploy a model

This workflow will deploy a model to Heroku upon stage assignment:

# .github/workflows/deploy.yaml
on:
  push:
    tags:
      - '*'

# set credentials to run deployment and save its state to s3
env:
  HEROKU_API_KEY: ${{ secrets.HEROKU_API_KEY }}
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

jobs:
  act:
    name: Deploy a model upon stage assignment
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - id: gto
        uses: iterative/gto-action@v2
      - uses: actions/setup-python@v2
      - name: Install dependencies
        run: |
          pip install --upgrade pip setuptools wheel
          pip install -r requirements.txt
      - if: steps.gto.outputs.event == 'assignment'
        run: |
          # TODO: check this works
          mlem deployment run \
            --load deploy/${{ steps.gto.outputs.stage }} \
            --model ${{ steps.gto.outputs.path }}

This relies on having deployment declarations in the deploy/ directory, such as:

# deploy/dev.yaml
object_type: deployment
type: heroku
app_name: mlem-dev

This declaration is read by MLEM in CI and the model promoted to dev is deployed to https://mlem-dev.herokuapp.com.

Note, that you need to provide environment variables to deploy to Heroku and update the deployment state. The location for the state should be configured in MLEM config file:

# .mlem.yaml
core:
  state:
    uri: s3://bucket/path

Check out another example of MLEM model deployment in the main branch of the example-gto repo.

Configuring GTO

To configure GTO, use file .gto in the root of your repo:

# .gto config file
stages: [dev, stage, prod] # list of allowed Stages

When allowed Stages are specified, GTO will check commands you run and error out if you provided a value that doesn't exist in the config. Note, that GTO applies the config from the workspace, so if want to apply the config from main branch, you need to check it out first with git checkout main.

Alternatively, you can use environment variables (note the GTO_ prefix)

$ GTO_EMOJIS=false gto show

Git tags format

You can work with GTO without knowing these conventions, since gto commands take care of everything for you.

All events have the standard formats of Git tags:

{artifact_name}@{version_number}#{e} for version registration.
{artifact_name}@{version_number}!#{e} for version deregistration.
{artifact_name}#{stage}#{e} for stage assignment.
{artifact_name}#{stage}!#{e} for stage unassignment.
{artifact_name}@deprecated#{e} for artifact deprecation.

All of them share two parts:

{artifact_name} prefix part.
#{e} counter at the end that can be omitted (in "simple" Git tag format).

Generally, #{e} counter is used, because Git doesn't allow to create two Git tags with the same name. If you want to have two Git tags that assign dev stage to model artifact without the counter (model#dev), that will require deleting the old Git tag first. Consequently, that doesn't allow you to preserve history of events that happened.

By default, #{e} sometimes is omitted, sometimes not. We are setting defaults to omit using #{e} when it's rarely necessary, e.g. for version registrations and artifact deprecations.