Check out our new VS Code extension for experiment tracking and model development
After initializing MLEM we have an empty project (except for the config file), but soon we'll save something with MLEM to fill it up.
To save models with MLEM you just need to use mlem.api.save()
method instead
of some other way you saved your model before. Let's take a look at the
following python script:
# train.py
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from mlem.api import save
def main():
data, y = load_iris(return_X_y=True, as_frame=True)
rf = RandomForestClassifier(
n_jobs=2,
random_state=42,
)
rf.fit(data, y)
save(
rf,
"rf",
sample_data=data,
description="Random Forest Classifier",
)
if __name__ == "__main__":
main()
Here we load well-known iris dataset with sklearn and train a simple classifier. But instead of pickling the model we saved it with MLEM.
Now let's run this script and see how we save the model.
$ python train.py
...
$ tree .mlem/model/
.mlem/model
├── rf
└── rf.mlem
By default, MLEM saves your files to .mlem/
directory, but that can be
changed. See Project Structure for more
details.
The model was saved along with some metadata about it: rf
containing the model
binary and a rf.mlem
metafile containing information about it. Let's take a
look at it:
$ cat .mlem/model/rf.mlem
artifacts:
data:
hash: 59440b4398b8d45d8ad64d8d407cfdf9
size: 993
uri: logreg
model_type:
methods:
predict:
args:
- name: data
type_:
columns:
- ''
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)
dtypes:
- int64
- float64
- float64
- float64
- float64
index_cols:
- ''
type: dataframe
name: predict
returns:
dtype: int64
shape:
- null
type: ndarray
predict_proba:
args:
- name: data
type_:
columns:
- ''
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)
dtypes:
- int64
- float64
- float64
- float64
- float64
index_cols:
- ''
type: dataframe
name: predict_proba
returns:
dtype: float64
shape:
- null
- 3
type: ndarray
sklearn_predict:
args:
- name: X
type_:
columns:
- ''
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)
dtypes:
- int64
- float64
- float64
- float64
- float64
index_cols:
- ''
type: dataframe
name: predict
returns:
dtype: int64
shape:
- null
type: ndarray
sklearn_predict_proba:
args:
- name: X
type_:
columns:
- ''
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)
dtypes:
- int64
- float64
- float64
- float64
- float64
index_cols:
- ''
type: dataframe
name: predict_proba
returns:
dtype: float64
shape:
- null
- 3
type: ndarray
type: sklearn
object_type: model
requirements:
- module: sklearn
version: 1.0.2
- module: pandas
version: 1.4.1
- module: numpy
version: 1.22.3
It's a bit long, but we can see all that we need to use the model later:
predict
and predict_proba
sklearn
, numpy
, pandas
with particular versions we need
to run this model.Note that we didn't specify requirements: MLEM investigates the object you're saving (even if it's a complex one) and finds out all requirements needed!
Tag: 2-train
$ git add .mlem/model
$ git commit -m "Train the model"
$ git diff 2-train