Google ML Platform 

Deep Dive

 

Setting up a development environment with the right dependencies is not a straightforward task potentially increasing the development cycle.  Kubernetes provides a  way to replicate and share the development environment by building the image with all dependencies. Many vendors inluding Google have built Machine Learning pipelines to build, deploy and monitor Machine Learning workflows on docker containers. 

Machine Learning
Pipelines
From
different
vendors
Image not available
Image not available
Image not available
Image not available
Image not available
Slider

 

 

Kubeflow Pipeline from Google

 

Both Data Scientists and ML Engineers can build Kubeflow Pipelines by using either Notebook or python code deployed on kubernetes containers. Below  are the different ways to build kubeflow pipeline:

1. Build Kubeflow Pipeline by creating Kubernetes cluster via google cli

2. Build Kubeflow Pipeline from Google Cloud Platform

3. Build Kubeflow Pipeline from already created Kubernetes cluster and Python code

Once the Kubeflow pipeline is built, data scientists and ML engineers have the option to run the ML workflow either from jupyter notebook directly or uploading from google storage. Let’s dive deeper into the build and execute process of ML workflows using Kubeflow pipeline.

Build Kubeflow Pipeline (CLI Approach)

Step 1 : Create Kubernetes Cluster using the gcloud command

$ CLUSTERNAME=<<Cluster-Name>>

$ ZONE=<>

$ gcloud config set compute/zone $ZONE

$ gcloud beta container clusters create $CLUSTERNAME –zone $ZONE –scopes cloud-platform –enable-cloud-logging –enable-cloud-monitoring –machine-type n1-standard-2 –num-nodes 4

Step 2: Setup admin role for above created Kubernetes cluster

kubectl create clusterrolebinding ml-pipeline-admin-binding –clusterrole=cluster-admin –user=$(gcloud config get-value account)

Step 3: Install ML Pipeline

$ kubectl create -f bootstrapper.yaml

<output>

clusterrole.rbac.authorization.k8s.io/mlpipeline-deploy-admin-5 created

clusterrolebinding.rbac.authorization.k8s.io/mlpipeline-admin-role-binding-5 created

job.batch/deploy-ml-pipeline-clscv created

$ kubectl get job

NAME                       COMPLETIONS DURATION AGE

deploy-ml-pipeline-clscv   0/1           2m40s 2m40s

 

Build Kubeflow Pipeline (UI Approach)

 

 

Build Kubeflow Pipeline (DevOps Approach)

 

The DevOps way of building kubeflow pipeline allows ML Engineers to create Machine Learning pipeline using Python code and already-built Kubernetes cluster.

Pipeline Definition

@dsl.pipeline(

    name=‘<<Pipeline Name goes here>>’,

    description=‘<<Pipeline Description>>’

)

Import container operation

from kfp.dsl import Containerop

Build Pipeline by stitching the steps

def sample_pipeline(context: str=context):

step_1 = ContainerOp(

name = step 1, # name of the operation

image = ‘docker.io/step1/,        

arguments = [context],        

file_outputs = {

‘context’: ‘/output.txt’        

})

step_2 = ContainerOp(

name = step 2, # name of the operation

image = ‘docker.io/step2/,        

arguments = [step_1], #pass step_1.output as argument,        

file_outputs = {

‘context’: ‘/output.txt’        

 

How to Launch Jupyter Notebook on Kubeflow pipeline 

 

Step 1: 

After the kubeflow pipelines are built on kubernetes cluster, Data scientists can work on building Machine Learning models using Notebooks by starting JupyterHub notebook. 

Step 2: 

Once the model is built, ML models can be deployed as a component. 

Step 3: 

Papermill executes the ML code in notebook by copying from google cloud storage to kubeflow pod and once executed the result is stored back in google cloud storage.

Step 4:

Launch Jupyter Notebook as part of Kubeflow Pipeline

@dsl.pipeline(

  name=<<Pipeline Name>>,

  description=’<<Pipeline Description>>

)

notebookop = dsl.ContainerOp(

     name=<<model name>>,

     image=’gcr.io/sample/samplenotebook:latest’,

     arguments=[

       inputnotebook,

       outputnotebook,

       params

     ]

   )

The above notebook model can be visualized on the Kubeflow pipeline dashboard which also have the option to experiment, execute and store the artifact of ML workflow.

There are also other options in market to automate kubeflow pipeline like Kale, Kubeflow Automated Pipeline Engine. 

Kubeflow pipeline automates the ML workflow aiding in Continuous Training (CT) whenever there is a need for retraining after change in the data and automating CI / CD process whenever there is a code change triggering redeployment of ML models.

Part 1 – End to End ML Platform! Are we there yet?

https://predera.com/end-to-end-ml-platform-are-we-there-yet-part1/

Stay tuned for Part 3 – End to End ML Platform! Are we there yet?