Model Deployers

Clipper provides a collection of model deployer modules to simplify the process of deploying a trained model to Clipper and avoid the need to figure out how to save models and build custom Docker containers capable of serving the saved models for some common use cases. With these modules, you can deploy models directly from Python to Clipper.

Currently, Clipper provides the following deployer modules:

  1. Arbitrary Python functions
  2. PySpark Models
  3. PyTorch Models
  4. Tensorflow Models
  5. MXNet Models
  6. PyTorch Models exported as ONNX file with Caffe2 Serving Backend (Experimental)
  7. Keras Models

These deployers support function that can only be pickled using Cloudpickle and/or pure python libraries that can be installed via pip. For reference, please use the following flowchart to make decision about which deployer to use.

digraph foo {
   "Pure Python?" -> "Use python deployer & pkgs_to_install arg" [ label="Yes" ];
   "Pure Python?" -> "Does Clipper provide a deployer?" [ label="No" ];
   "Does Clipper provide a deployer?" -> "Use {PyTorch | TensorFlow | PySpark | ...} deployers" [ label="Yes" ];
   "Does Clipper provide a deployer?" -> "Build your own container" [ label="No" ];
}

Note

You can find additional examples of using model deployers in Clipper’s integration tests.

Pure Python functions

This module supports deploying pure Python function closures to Clipper. A function deployed with this module must take a list of inputs as the sole argument, and return a list of strings of exactly the same length. The reason the prediction function takes a list of inputs rather than a single input is to provide models the possibility of computing multiple predictions in parallel to improve model performance. For example, many models that run on a GPU can significantly improve throughput by batching predictions to better utilize the many parallel cores of the GPU.

In addition, the function must only use pure Python code. More specifically, all of the state captured by the function will be pickled using Cloudpickle, so any state captured by the function must be able to be pickled. Most Python libraries that use C extensions create objects that cannot be pickled. This includes many common machine-learning frameworks such as PySpark, TensorFlow, PyTorch, and Caffe. You will have to use Clipper provided containers or create your own Docker containers and call the native serialization libraries of these frameworks in order to deploy them.

While this deployer will serialize your function, any Python libraries that the function depends on must be installed in the container to be able to load the function inside the model container. You can specify these libraries using the pkgs_to_install argument. All the packages specified by that argument will be installed in the container with pip prior to running it.

If your function has dependencies that cannot be installed directly with pip, you will need to build your own container.

clipper_admin.deployers.python.deploy_python_closure(clipper_conn, name, version, input_type, func, base_image='default', labels=None, registry=None, num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Deploy an arbitrary Python function to Clipper.

The function should take a list of inputs of the type specified by input_type and return a Python list or numpy array of predictions as strings.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • version (str) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Example

Define a pre-processing function center() and train a model on the pre-processed input:

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers.python import deploy_python_closure
import numpy as np
import sklearn

clipper_conn = ClipperConnection(DockerContainerManager())

# Connect to an already-running Clipper cluster
clipper_conn.connect()

def center(xs):
    means = np.mean(xs, axis=0)
    return xs - means

centered_xs = center(xs)
model = sklearn.linear_model.LogisticRegression()
model.fit(centered_xs, ys)

# Note that this function accesses the trained model via closure capture,
# rather than having the model passed in as an explicit argument.
def centered_predict(inputs):
    centered_inputs = center(inputs)
    # model.predict returns a list of predictions
    preds = model.predict(centered_inputs)
    return [str(p) for p in preds]

deploy_python_closure(
    clipper_conn,
    name="example",
    input_type="doubles",
    func=centered_predict)
clipper_admin.deployers.python.create_endpoint(clipper_conn, name, input_type, func, default_output='None', version=1, slo_micros=3000000, labels=None, registry=None, base_image='default', num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Registers an application and deploys the provided predict function as a model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • default_output (str, optional) – The default output for the application. The default output will be returned whenever an application is unable to receive a response from a model within the specified query latency SLO (service level objective). The reason the default output was returned is always provided as part of the prediction response object. Defaults to “None”.
  • version (str, optional) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • slo_micros (int, optional) – The query latency objective for the application in microseconds. This is the processing latency between Clipper receiving a request and sending a response. It does not account for network latencies before a request is received or after a response is sent. If Clipper cannot process a query within the latency objective, the default output is returned. Therefore, it is recommended that the SLO not be set aggressively low unless absolutely necessary. 100000 (100ms) is a good starting value, but the optimal latency objective will vary depending on the application.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accessible to the Kubernetes cluster in order to fetch the container from the registry.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.
<