π DataChain Open-Source Release. Star us on !
To serve models in production in a scalable and failure-safe way, one needs something more than Heroku. Kubernetes is an open source container orchestration engine for automating deployment, scaling, and management of containerized applications.
Below, we will deploy a model to a kubernetes cluster exposing its prediction endpoints through a service.
$ pip install mlem[kubernetes]
# or
$ pip install kubernetes docker
In order to deploy your model-server on a K8s cluster, you must first have a ready setup. Namely:
As a quick sanity command - try runnign kubectl get pods
to check your cluster
is reachable and kubeconfig configured correctly. See the
Kubernetes Basics to
learn more about the concepts above.
Deploying to a Kubernetes cluster involves 2 main steps:
namespace
, a
deployment
and a service
.Once this is done, one can use the usual workflow of
mlem deployment run
to deploy on
Kubernetes.
MLEM tries to find the kubeconfig file from the environment variable
KUBECONFIG
or the default location ~/.kube/config
.
If you need to use another path, one can pass it with
--kube_config_file_path ...
You can use mlem deploy run kubernetes -h
to list all the configurable
parameters.
Most of the configurable parameters in the list above come with sensible defaults. But at the least, one needs to follow the structure given below:
$ mlem deployment run kubernetes service_name \
--model model \
--service_type loadbalancer
πΎ Saving deployment to service_name.mlem
β³οΈ Loading model from model.mlem
π Creating docker image ml
π Building MLEM wheel file...
πΌ Adding model files...
π Generating dockerfile...
πΌ Adding sources...
πΌ Generating requirements file...
π Building docker image ml:4ee45dc33804b58ee2c7f2f6be447cda...
β
Built docker image ml:4ee45dc33804b58ee2c7f2f6be447cda
namespace created. status='{'conditions': None, 'phase': 'Active'}'
deployment created. status='{'available_replicas': None,
'collision_count': None,
'conditions': None,
'observed_generation': None,
'ready_replicas': None,
'replicas': None,
'unavailable_replicas': None,
'updated_replicas': None}'
service created. status='{'conditions': None, 'load_balancer': {'ingress': None}}'
β
Deployment ml is up in mlem namespace
where:
service_name
is a name of one's own choice, of which corresponding
service_name.mlem
and service_name.mlem.state
files will be created.model
denotes the path to model saved via mlem
.service_type
is configurable and is passed as loadbalancer
. The default
value is nodeport
if not passed.One can check the docker image built via docker image ls
which should give the
following output:
REPOSITORY TAG IMAGE ID CREATED SIZE
ml 4ee45dc33804b58ee2c7f2f6be447cda 16cf3d92492f 3 minutes ago 778MB
...
Pods created can be checked via kubectl get pods -A
which should have a pod in
the mlem
namespace present as shown below:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-xp68b 1/1 Running 7 (12m ago) 7d22h
...
kube-system storage-provisioner 1/1 Running 59 (11m ago) 54d
mlem ml-cddbcc89b-zkfhx 1/1 Running 0 5m58s
By default, all resources are created in the mlem
namespace. This ofcourse is
configurable using --namespace prod
where prod
is the desired namespace
name.
One can of course use the
mlem deployment apply
command to
ping the deployed endpoint to get the predictions back. An example could be:
$ mlem deployment apply service_name data --json
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
where data
is the dataset saved via mlem
.
A model can easily be undeployed using mlem deploy remove service_name
which
will delete the pods
, services
and the namespace
i.e. clear the resources
from the cluster. The docker image will still persist in the registry though.
If you want to change the model that is currently under deployment, run
$ mlem deploy run --load service_name --model other-model
This will build a new docker image corresponding to the other-model
and will
terminate the existing pod and create a new one, thereby replacing it, without
downtime.
This can be seen below:
REPOSITORY TAG IMAGE ID CREATED SIZE
ml d57e4cacec82ebd72572d434ec148f1d 9bacd4cd9cc0 11 minutes ago 2.66GB
ml 4ee45dc33804b58ee2c7f2f6be447cda 26cb86b55bc4 About an hour ago 778MB
...
Notice how a new docker image with the tag d57e4cacec82ebd72572d434ec148f1d
is
built.
β³οΈ Loading model from other-model.mlem
β³οΈ Loading deployment from service_name.mlem
π Creating docker image ml
π Building MLEM wheel file...
πΌ Adding model files...
π Generating dockerfile...
πΌ Adding sources...
πΌ Generating requirements file...
π Building docker ml:d57e4cacec82ebd72572d434ec148f1d...
β
Built docker image ml:d57e4cacec82ebd72572d434ec148f1d
β
Deployment ml is up in mlem namespace
Here, an existing deployment i.e. service_name
is used but with a newer model.
Hence, details of registry need not be passed again. The contents of
service_name
can be checked by inspecting the service_name.mlem
file.
We can see the existing pod being terminated and the new one running in its place below:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-pr8cn 1/1 Running 0 90m
...
kube-system kube-proxy-dfxsv 1/1 Running 0 90m
mlem ml-66b9588df5-wmc2v 1/1 Running 0 99s
mlem ml-cddbcc89b-zkfhx 1/1 Terminating 0 60m
The deployment to a cloud managed kubernetes cluster such as EKS is simple and analogous to how it is done in the steps above for a local cluster (such as minikube).
The popular docker registry choice to be used with EKS is ECR (Elastic Container Registry). Make sure the EKS cluster has at least read access to ECR.
Make sure you have a repository in ECR where docker images can be uploaded. In
the sample screenshot below, there exists a classifier
repository:
Provided that the default kubeconfig file (present at ~/.kube/config
) can
communicate with EKS, execute the following command:
$ mlem deploy run kubernetes service_name \
--model model \
--registry ecr \
--registry.account 342840881361 \
--registry.region "us-east-1" \
--registry.host "342840881361.dkr.ecr.us-east-1.amazonaws.com/classifier" \
--image_name classifier --service_type loadbalancer
πΎ Saving deployment to service_name.mlem
β³οΈ Loading model from model.mlem
π Creating docker image classifier
π Building MLEM wheel file...
πΌ Adding model files...
π Generating dockerfile...
πΌ Adding sources...
πΌ Generating requirements file...
π Building docker image 342840881361.dkr.ecr.us-east-1.amazonaws.com/classifier:4ee45dc33804b58ee2c7f2f6be447cda...
π Logged in to remote registry at host 342840881361.dkr.ecr.us-east-1.amazonaws.com
β
Built docker image 342840881361.dkr.ecr.us-east-1.amazonaws.com/classifier:4ee45dc33804b58ee2c7f2f6be447cda
πΌ Pushing image 342840881361.dkr.ecr.us-east-1.amazonaws.com/classifier:4ee45dc33804b58ee2c7f2f6be447cda to
342840881361.dkr.ecr.us-east-1.amazonaws.com
β
Pushed image 342840881361.dkr.ecr.us-east-1.amazonaws.com/classifier:4ee45dc33804b58ee2c7f2f6be447cda to
342840881361.dkr.ecr.us-east-1.amazonaws.com
namespace created. status='{'conditions': None, 'phase': 'Active'}'
deployment created. status='{'available_replicas': None,
'collision_count': None,
'conditions': None,
'observed_generation': None,
'ready_replicas': None,
'replicas': None,
'unavailable_replicas': None,
'updated_replicas': None}'
service created. status='{'conditions': None, 'load_balancer': {'ingress': None}}'
β
Deployment classifier is up in mlem namespace
classifier
has to match with the
image_name
supplied through --image_name
One can check the docker image built via docker image ls
which should give the
following output:
REPOSITORY TAG IMAGE ID CREATED SIZE
342840881361.dkr.ecr.us-east-1.amazonaws.com/classifier 4ee45dc33804b58ee2c7f2f6be447cda 96afb03ad6f5 2 minutes ago 778MB
...
This can also be verified in ECR:
Pods created can be checked via kubectl get pods -A
which should have a pod in
the mlem
namespace present as shown below:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-pr8cn 1/1 Running 0 11m
...
kube-system kube-proxy-dfxsv 1/1 Running 0 11m
mlem classifier-687655f977-h7wsl 1/1 Running 0 83s
Services created can be checked via kubectl get svc -A
which should look like
the following:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 20m
kube-system kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 20m
mlem classifier LoadBalancer 10.100.87.16 a069daf48f9f244338a4bf5c60c6b823-1734837081.us-east-1.elb.amazonaws.com 8080:32067/TCP 2m32s
One can clearly visit the External IP of the service classifier
created by
mlem
i.e.
a069daf48f9f244338a4bf5c60c6b823-1734837081.us-east-1.elb.amazonaws.com:8080
using the browser and see the usual FastAPI docs page:
But one can also use the
mlem deployment apply
command to
ping the deployed endpoint to get the predictions back. An example could be:
$ mlem deployment apply service_name data --json
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
i.e. mlem
knows how to calculate the externally reachable endpoint given the
service type.
A note about NodePort Service
While the example discussed above deploys a LoadBalancer Service Type, but one
can also use NodePort (which is the default) OR via --service_type nodeport
While mlem
knows how to calculate externally reachable IP address, make sure
the EC2 machine running the pod has external traffic allowed to it. This can be
configured in the inbound rules of the node's security group.
This can be seen as the last rule being added below: