🚀 DataChain Open-Source Release. Star us on !
MLEM has a number of abstract base classes that anyone can implement to extend to add new capabilities to MLEM.
Each abstract base class in this list is a subclass of mlem.core.base.MlemABC
class, which is a subclass of pydantic BaseModel
with additional polymorphic
magic.
That means that all subclasses are also BaseModel
s and should be serializable.
This way MLEM can save/load them as part of the other objects or dynamically
provide options to configure them in CLI.
Transient fields are used to hold some operational object and are not saved when an object is dumped. After opening objects with transient fields they will be empty until you load the object.
Here is the list of all MLEM ABCs.
Represents a MLEM Object
Base class: mlem.core.objects.MlemObject
For more info and list of subtypes look here
Represents different types of requirements for MLEM Object.
Base class: mlem.core.requirements.Requirement
Implementations:
installable
- a Python requirement typically installed through pip
. Can
have specific version and alternative package name.custom
- a Python requirement in the form of a local .py
file or a python
package. Contains name and source code for the module/package.unix
- unix package typically installed through apt
or yum
Represents some file format that MLEM can try to import.
Base class: mlem.core.import_objects.ImportHook
Implementations:
pickle
- simply unpickle the contens of file and use default MLEM object
analyzer. Works with pickled files.pandas
- try to read a file into pandas.DataFrame
. Works with files saved
with Pandas in formats like
csv, json, excel, parquet, feather, stata, html, parquet
. Some formats
require additional dependencies.This class is basically a wrapper for all Model classes of different libraries. Yes, yet another standard. If you want to add support for your ML Model in MLEM, this is what you implement!
Base class: mlem.core.model.ModelType
This class is polymorphic, which means that it can have more fields depending on implementation.
Fields:
io
- an instance of ModelIO
, a way to save and load the modelmethod
- a string-to-signature mapping which holds information about
available model methodsmodel
(transient) - will hold the actual model object, if it was loadedThere are implementations of this class for all supported libraries: xgboost
,
catboost
, lightgbm
, torch
, sklearn
.
The one notable implementation is callable
: it treats any Python callable
object as a model with a single method __call__
. That means you can turn
functions and class methods into MLEM Models as well!
Represents a way that model can be saved and loaded. A required field of
ModelType
class. If a ML library has its own way to save and load models, it
goes here.
Base class: mlem.core.model.ModelIO
There are implementations for all supported libraries: torch_io
, xgboost_io
,
lightgbm_io
, catboost_io
Also, universal simple_pickle
is available, which simply pickles the model
(used by sklearn
, for example).
There is also separate pickle
implementation, which can detect other model
types inside your object and use their IO's for them. This is very handy when
you for example wrap your torch
NN with a Python function: the function part
will be pickled, and the NN will be saved using torch_io
.
Holds metadata about data, like type, dimensions, column names etc.
Base class: mlem.core.data_type.DataType
Fields:
data
(transient) - underlying data object, if it was readImplementations:
Python:
primitive
- any of the Python primitivestuple
- a tuple of objects, each can have different typelist
- a list of objects, but they should be the same typetuple_like_list
- a list of objects, each can have different typedict
- a dictionary, each key can have different typePandas:
dataframe
- pd.DataFrame
. Holds info about columns, their types and
indexesseries
- pd.Series
. Holds info about columns, their types and indexesNumPy:
ndarray
- np.ndarray
. Holds info about type and dimensionsnumber
- np.number
. Holds info about typeML Libraries:
xgboost_dmatrix
- xgboost.DMatrix
. Holds info about feature names and
their typeslightgbm
- lightgbm.Dataset
. Holds information about inner data object
(dataframe or ndarray)torch
- torch.Tensor
. Holds information about type and dimensionsSpecial:
unspecified
- Special dataset type when no dataset info was providedHolds all the information needed to read dataset.
Base class: mlem.core.data_type.DataReader
Fields:
data_type
- resulting data typeImplementations:
pandas
numpy
Writes data to files, producing a list of Artifact
and corresponding
DataReader
Base class: mlem.core.data_type.DataWriter
Implementations:
pandas
numpy
Represents a file saved in some storage.
Base class: mlem.core.artifacts.Artifact
Implementations:
local
- local filefsspec
- file in remote file systemdvc
- file in dvc cacheDefines where the artifacts will be written. Produces corresponding Artifact
instances.
Base class: mlem.core.artifacts.Storage
Implementations:
local
- store files on the local file systemfsspec
- store files in some remote file systemdvc
- store files locally, but try to read them from DVC cache if they are
absentRepresents an interface for service runtime. Provides a mapping method name to its signature. Also provides executor functions for those methods.
Base class: mlem.runtime.interface.Interface
Implementations:
simple
- base class for interfaces created manually. Will expose subclass
methods marked with @expose
decorator.model
- dynamically create interface from ModelType
Runs configured interface, exposing its methods as endpoints.
Base class: mlem.runtime.server.Server
Implementations:
fastapi
- starts FastAPI
serverrmq
- creates a queue in RabbitMQ
instance and a consumer for each
interface methodClients for corresponding servers
Base class: mlem.runtime.client.Client
Implementations:
http
- makes request for http servers like fastapi
rmq
- client for rmq
serverDeclaration for creating a build
from model. You can learn more about building
in this User Guide
Base class: mlem.core.objects.MlemBuilder
Implementations:
Python packages:
pip
- create a directory with Python package from modelwhl
- create a .whl
file with Python packageDocker:
docker_dir
- create a directory with context for Docker image buildingdocker
- build a Docker image from modelDeclaration of target environment for deploying models.
Base class: mlem.core.objects.MlemEnv
Implementations:
heroku
- an account on heroku platformDeclaration and state of deployed model.
Base class: mlem.core.objects.MlemDeployment
Fields:
env_link
- link to targeted environmentenv
(transient) - loaded targeted environmentmodel_link
- link to deployed model objectmodel
(transient) - loaded model objectstate
- deployment stateImplementations:
heroku
- app deployed to Heroku platformRepresents state of the deployment
Base class: mlem.core.objects.DeployState
Implementations:
heroku
- state of the deployed Heroku app