Python Function Sample

Requirements

  • Authenticated to gcloud (gcloud auth application-default login)

This notebook demonstrate how to develop a python function based model. This type of model is useful as user would be able to define their own logic inside the model as long as it satisfy contract given in merlin.PyFuncModel. The model that we are going to develop is an ensembling of xgboost and sklearn model.

[ ]:
!pip install --upgrade -r requirements.txt > /dev/null
[ ]:
import merlin
import warnings
import os
import xgboost as xgb
from merlin.model import ModelType, PyFuncModel
from sklearn import svm
from sklearn.datasets import load_iris
from joblib import dump
warnings.filterwarnings('ignore')

1. Initialize

1.1 Set Server

[ ]:
merlin.set_url("localhost:3000/api/merlin")

1.2 Set Active Project

project represent a project in real life. You may have multiple model within a project.

merlin.set_project(<project_name>) will set the active project into the name matched by argument. You can only set it to an existing project. If you would like to create a new project, please do so from the MLP console at http://localhost:3000/projects/create.

[ ]:
merlin.set_project("sample")

1.3 Set Active Model

model represents an abstract ML model. Conceptually, model in MLP is similar to a class in programming language. To instantiate a model you’ll have to create a model_version.

Each model has a type, currently model type supported by MLP are: sklearn, xgboost, tensorflow, pytorch, and user defined model (i.e. pyfunc model).

model_version represents a snapshot of particular model iteration. You’ll be able to attach information such as metrics and tag to a given model_version as well as deploy it as a model service.

merlin.set_model(<model_name>, <model_type>) will set the active model to the name given by parameter, if the model with given name is not found, a new model will be created.

[ ]:
merlin.set_model("pyfunc-sample-2", ModelType.PYFUNC)

2. Train Model

In this step we are going to train 2 IRIS classifier model and combine the prediction result into a single model which will be implemented as a PyFunc type model.

2.1 Train First Model

[ ]:
model_1_dir = "xgboost-model"
BST_FILE = "model_1.bst"

iris = load_iris()
y = iris['target']
X = iris['data']
dtrain = xgb.DMatrix(X, label=y)
param = {'max_depth': 6,
            'eta': 0.1,
            'silent': 1,
            'nthread': 4,
            'num_class': 3,
            'objective': 'multi:softprob'
            }
xgb_model = xgb.train(params=param, dtrain=dtrain)
model_1_path = os.path.join(model_1_dir, BST_FILE)
xgb_model.save_model(model_1_path)

2.2 Train Second Model

[ ]:
model_2_dir = "sklearn-model"
MODEL_FILE = "model_2.joblib"
model_2_path = os.path.join(model_2_dir, MODEL_FILE)

clf = svm.SVC(gamma='scale', probability=True)
clf.fit(X, y)
dump(clf, model_2_path)

2.3 Create PyFunc Model

To create a PyFunc model you’ll have to extend merlin.PyFuncModel class and implement its initialize and infer method.

initialize will be called once during model initialization. The argument to initialize is a dictionary containing a key value pair of artifact name and its URL. The artifact’s keys are the same value as received by log_pyfunc_model.

infer method is the prediction method that is need to be implemented. It accept a dictionary type argument which represent incoming request body. infer should return a dictionary object which correspond to response body of prediction result.

In following example we are creating PyFunc model called EnsembleModel. In its initialize method we expect 2 artifacts called xgb_model and sklearn_model, those 2 artifacts would point to the serialized model file of each model. The infer method will simply does prediction for both model and return the average value.

[ ]:
import xgboost as xgb
import joblib
import numpy as np

class EnsembleModel(PyFuncModel):
    def initialize(self, artifacts):
        self._model_1 = xgb.Booster(model_file=artifacts["xgb_model"])
        self._model_2 = joblib.load(artifacts["sklearn_model"])

    def infer(self, request, **kwargs):
        model_input = request["instances"]
        inputs = np.array(model_input)
        dmatrix = xgb.DMatrix(inputs)
        result_1 = self._model_1.predict(dmatrix)
        result_2 = self._model_2.predict_proba(inputs)
        return {"predictions": ((result_1 + result_2) / 2).tolist()}

Let’s test it locally

[ ]:
m = EnsembleModel()
m.initialize({"xgb_model": model_1_path, "sklearn_model": model_2_path})
m.infer({"instances": [[1,2,3,4], [2,1,2,4]] })

3. Deploy Model

To deploy the model, we will have to create an iteration of the model (by create a model_version), upload the serialized model to MLP, and then deploy.

3.1 Create Model Version and Upload

merlin.new_model_version() is a convenient method to create a model version and start its development process. It is equal to following codes:

v = model.new_model_version()
v.start()
v.log_pyfunc_model(model_instance=EnsembleModel(),
                conda_env="env.yaml",
                artifacts={"xgb_model": model_1_path, "sklearn_model": model_2_path})
v.finish()

To upload PyFunc model you have to provide following arguments: 1. model_instance is the instance of PyFunc model, the model has to extend merlin.PyFuncModel 2. conda_env is path to conda environment yaml file. The environment yaml file must contain all dependency required by the PyFunc model. 3. (Optional) artifacts is additional artifact that you want to include in the model 4. (Optional) code_path is a list of directory containing python code that will be loaded during model initialization, this is required when model_instance depend on local python package

[ ]:
with merlin.new_model_version() as v:
    merlin.log_pyfunc_model(model_instance=EnsembleModel(),
                conda_env="env.yaml",
                artifacts={"xgb_model": model_1_path, "sklearn_model": model_2_path})

3.2 Deploy Model

We can also pass environment variable to the model during deployment by passing a dictionary of environment variables

[ ]:
env_vars = {"WORKERS": "1"}

Each of a deployed model version will have its own generated url

[ ]:
endpoint = merlin.deploy(v, env_vars=env_vars)

3.3 Send Test Request

[ ]:
%%bash -s "$endpoint.url"
curl -v -X POST $1 -d '{
  "instances": [
    [2.8,  1.0,  6.8,  0.4],
    [3.1,  1.4,  4.5,  1.6]
  ]
}'

3.4 Delete Deployment

[ ]:
merlin.undeploy(v)
[ ]: