Python Function Sample¶
Requirements¶
- Authenticated to gcloud (
gcloud auth application-default login
)
This notebook demonstrate how to develop a python function based model. This type of model is useful as user would be able to define their own logic inside the model as long as it satisfy contract given in merlin.PyFuncModel
. The model that we are going to develop is an ensembling of xgboost and sklearn model.
[ ]:
!pip install --upgrade -r requirements.txt > /dev/null
[ ]:
import merlin
import warnings
import os
import xgboost as xgb
from merlin.model import ModelType, PyFuncModel
from sklearn import svm
from sklearn.datasets import load_iris
from joblib import dump
warnings.filterwarnings('ignore')
1. Initialize¶
1.1 Set Server¶
[ ]:
merlin.set_url("localhost:3000/api/merlin")
1.2 Set Active Project¶
project
represent a project in real life. You may have multiple model within a project.
merlin.set_project(<project_name>)
will set the active project into the name matched by argument. You can only set it to an existing project. If you would like to create a new project, please do so from the MLP console at http://localhost:3000/projects/create.
[ ]:
merlin.set_project("sample")
1.3 Set Active Model¶
model
represents an abstract ML model. Conceptually, model
in MLP is similar to a class in programming language. To instantiate a model
you’ll have to create a model_version
.
Each model
has a type, currently model type supported by MLP are: sklearn, xgboost, tensorflow, pytorch, and user defined model (i.e. pyfunc model).
model_version
represents a snapshot of particular model
iteration. You’ll be able to attach information such as metrics and tag to a given model_version
as well as deploy it as a model service.
merlin.set_model(<model_name>, <model_type>)
will set the active model to the name given by parameter, if the model with given name is not found, a new model will be created.
[ ]:
merlin.set_model("pyfunc-sample-2", ModelType.PYFUNC)
2. Train Model¶
In this step we are going to train 2 IRIS classifier model and combine the prediction result into a single model which will be implemented as a PyFunc type model.
2.1 Train First Model¶
[ ]:
model_1_dir = "xgboost-model"
BST_FILE = "model_1.bst"
iris = load_iris()
y = iris['target']
X = iris['data']
dtrain = xgb.DMatrix(X, label=y)
param = {'max_depth': 6,
'eta': 0.1,
'silent': 1,
'nthread': 4,
'num_class': 3,
'objective': 'multi:softprob'
}
xgb_model = xgb.train(params=param, dtrain=dtrain)
model_1_path = os.path.join(model_1_dir, BST_FILE)
xgb_model.save_model(model_1_path)
2.2 Train Second Model¶
[ ]:
model_2_dir = "sklearn-model"
MODEL_FILE = "model_2.joblib"
model_2_path = os.path.join(model_2_dir, MODEL_FILE)
clf = svm.SVC(gamma='scale', probability=True)
clf.fit(X, y)
dump(clf, model_2_path)
2.3 Create PyFunc Model¶
To create a PyFunc model you’ll have to extend merlin.PyFuncModel
class and implement its initialize
and infer
method.
initialize
will be called once during model initialization. The argument to initialize
is a dictionary containing a key value pair of artifact name and its URL. The artifact’s keys are the same value as received by log_pyfunc_model
.
infer
method is the prediction method that is need to be implemented. It accept a dictionary type argument which represent incoming request body. infer
should return a dictionary object which correspond to response body of prediction result.
In following example we are creating PyFunc model called EnsembleModel
. In its initialize
method we expect 2 artifacts called xgb_model
and sklearn_model
, those 2 artifacts would point to the serialized model file of each model. The infer
method will simply does prediction for both model and return the average value.
[ ]:
import xgboost as xgb
import joblib
import numpy as np
class EnsembleModel(PyFuncModel):
def initialize(self, artifacts):
self._model_1 = xgb.Booster(model_file=artifacts["xgb_model"])
self._model_2 = joblib.load(artifacts["sklearn_model"])
def infer(self, request, **kwargs):
model_input = request["instances"]
inputs = np.array(model_input)
dmatrix = xgb.DMatrix(inputs)
result_1 = self._model_1.predict(dmatrix)
result_2 = self._model_2.predict_proba(inputs)
return {"predictions": ((result_1 + result_2) / 2).tolist()}
Let’s test it locally
[ ]:
m = EnsembleModel()
m.initialize({"xgb_model": model_1_path, "sklearn_model": model_2_path})
m.infer({"instances": [[1,2,3,4], [2,1,2,4]] })
3. Deploy Model¶
To deploy the model, we will have to create an iteration of the model (by create a model_version
), upload the serialized model to MLP, and then deploy.
3.1 Create Model Version and Upload¶
merlin.new_model_version()
is a convenient method to create a model version and start its development process. It is equal to following codes:
v = model.new_model_version()
v.start()
v.log_pyfunc_model(model_instance=EnsembleModel(),
conda_env="env.yaml",
artifacts={"xgb_model": model_1_path, "sklearn_model": model_2_path})
v.finish()
To upload PyFunc model you have to provide following arguments: 1. model_instance
is the instance of PyFunc model, the model has to extend merlin.PyFuncModel
2. conda_env
is path to conda environment yaml file. The environment yaml file must contain all dependency required by the PyFunc model. 3. (Optional) artifacts
is additional artifact that you want to include in the model 4. (Optional) code_path
is a list of directory containing python code that will be loaded during
model initialization, this is required when model_instance
depend on local python package
[ ]:
with merlin.new_model_version() as v:
merlin.log_pyfunc_model(model_instance=EnsembleModel(),
conda_env="env.yaml",
artifacts={"xgb_model": model_1_path, "sklearn_model": model_2_path})
3.2 Deploy Model¶
We can also pass environment variable to the model during deployment by passing a dictionary of environment variables
[ ]:
env_vars = {"WORKERS": "1"}
Each of a deployed model version will have its own generated url
[ ]:
endpoint = merlin.deploy(v, env_vars=env_vars)
3.3 Send Test Request¶
[ ]:
%%bash -s "$endpoint.url"
curl -v -X POST $1 -d '{
"instances": [
[2.8, 1.0, 6.8, 0.4],
[3.1, 1.4, 4.5, 1.6]
]
}'