š DataChain Open-Source Release. Star us on !
Saves given object to a given path.
def save(
obj: Any,
path: Union[str, os.PathLike],
project: Optional[str] = None,
sample_data=None,
fs: Optional[AbstractFileSystem] = None,
params: Dict[str, str] = None,
preprocess: Union[Any, Dict[str, Any]] = None,
postprocess: Union[Any, Dict[str, Any]] = None,
) -> MlemObject
from mlem.api import save
save(obj, path, index=False, external=True)
Saves a given object to a given path. The path can belong to different file
systems (eg: S3
). The function returns and saves the object as a
MLEM Object.
We often need to apply some preprocessing before and after the model is applied,
for that we have preprocess
and postprocess
arguments. You can think of them
like about running postprocess(model(preprocess(x)))
. See examples below.
obj
(required) - Object to dumppath
(required) - If not located on LocalFileSystem, then should be
urior fs
argument should be providedproject
(optional) - path to mlem project (optional)sample_data
(optional) - If the object is a model or function, you can
provide input data sample, so MLEM will include it's schema in the model's
metadatafs
(optional) - FileSystem for the path
argumentparams
(optional) - arbitrary params for objectpreprocess
(optional) - applies before the modelpostprocess
(optional) - applies after the modelNone
MlemObjectNotFound
- Thrown if we can't find MLEM objectimport os
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from pandas import DataFrame
from mlem.api import save
train, target = load_iris(return_X_y=True)
train = DataFrame(train)
train.columns = train.columns.astype(str)
model = DecisionTreeClassifier().fit(train, target)
path = os.path.join(os.getcwd(), "saved-model")
save(model, path, sample_data=train)
preprocess
and postprocess
can be functions or MLEM models:
def apply_emdedding(word):
# apply embedding
...
return embedding
def return_classname(prediction):
if len(prediction.shape) > 1:
return "A surname" if prediction[0][0] < prediction[0][1] else "Not a surname"
return "A surname" if prediction[0] else "Not a surname"
mlem.api.save(
classify_word, # trained on a dataset created by applying `apply_emdedding`
"surname_classifier",
preprocess=apply_emdedding,
postprocess=return_classname,
sample_data="Gagarin",
)
If you need different pre- and post-processor for different model methods, you
can specify them with dictionaries (let's assume classify_word
is a sklearn
model and have two methods: predict
and predict_proba
):
mlem.api.save(
classify_word, # trained on a dataset created by applying `apply_emdedding`
"surname_classifier",
preprocess={
"predict": apply_emdedding,
"predict_proba": apply_emdedding,
},
postprocess={
"predict": lambda p: "A surname" if p[0] else "Not a surname",
"predict_proba": lambda p: "A surname" if p[0][0] < p[0][1] else "Not a surname",
},
sample_data="Gagarin",
)