What is MLOps? Why do we need end to end MLOps solution for successful machine learning projects?

The primary responsibility of Data Scientists involves extracting value out of data by building and operationalizing Machine Learning Models. As businesses are embracing data science to improve business strategy, Data Scientists are struggling to manage the growing number of Machine Learning models.

Fintech, Healthcare, and Retail industries are earmarking their Machine Learning budget for 2020 to increase by 25% raising the scale and complexity of Machine Learning models.

With the growing complexities, Data Scientists are finding it laborious to manage the rising number of Machine Learning models in production. Based on the budget allotted for ML projects, enterprises either have separate teams or data scientists responsible for engineering data and building and managing ML models at scale.

"Time Taken to deploy a single model is 31 to 90 days"

There is a dire need for a seamless end to end Machine Learning platform to experiment ML models with proper version management and to deploy at scale with reproducible deployment pipelines. Cloud Vendors have taken notice and hopped on to the bandwagon to build ML platform for managing Machine Learning projects end-to-end.

AWS SageMaker

amazon sagemaker 1024x321 1 Predera AIQ

Amazon SageMaker lets users train Machine Learning models by creating a notebook instance from the SageMaker console along with proper IAM role  and S3 bucket access. One can use an already built-in algorithm or sell algorithms  and  models  in  AWS  marketplace.  SageMaker  lets  you  deploy  the model on Amazon model hosting service with an https endpoint for model inference. The compute resources are automatically scaled based on the data load. Amazon Model Monitor monitor model for any data drift or anomalies by making  use  of  Amazon  CloudWatch  Metrics.Amazon  provides  modular  services for  managing  ML  projects.  Using  AWS  Step  function,  one  can  automate  the retraining and CI / CD process.

AWS Step Function interlinks AWS services like AWS Lambda, AWS Fargate, AWS SageMaker to build workflows of your choice either it be an application or process for continuous deployment of ML models.

AWS Codepipeline, a CI / CD service along with AWS Step Function handling workflow-driven action provides a powerful automation pipeline.

Sagemaker – Sample step function

kubeflow aws 5 Predera AIQ

{   “StartAt”: “First Step”, “States”: {

“Deploy”: { “Type”: “Task”,

“Resource”: “arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME”,

“Next”: “Second Step” }, “Second Step”: { “Type”: “Task”,

“Resource”: “arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME”,

“End”: true }

}}

Limitations using SageMaker

Cost Intensive

Eventhough SageMaker provides the flexibility to customize Machine Learning models, the lack of interoperability to mix-and-match any cloud vendor’s Machine Learning services burden the enterprises adopting specific ML platform. Every ML vendor provides Basic and Enterprise version and the cost and services offered varies based on the selection. As the enterprises advance in Machine Learning, the growing number of datasets demands increase in pricing with respect to processing and storage capacity.The lack of relevant documentation and training along with the increasing cost to manage Machine Learning projects poses a threat to enterprises leading to less number of ML models in production.

Vendor Lock-in

ML platforms from Google, Amazon, Microsoft can run only on their own cloud or on- premise.  .Porting  from  one  ML  platform  to  another  is  a  tedious  task  as  it  involves  building ML platforms from ground up along with a  steep learning curve. Vendor lock-in is real and  it restricts enterprises to adopt multi-cloud strategy and deprives from using products and services from hybrid vendors.

Steep Learning Curve

Different   cloud   vendors   use   their   own   tools   and   technologies   to   build   ML   platforms. Moreover, the understanding of End-to-End varies as some platforms have streamlined build and deployment of ML models while others concentrate much on data engineering automating the ML build process. The commonality among the cloud vendors’ ML platform is   that   they   are   built   on   top   of   Kubernetes   clusters.   But   the   commonality   ends   there. Google’s Kubeflow Pipeline provides the flexibility in building ML pipelines either from Jupyter Notebook or from existing Kubernetes cluster using Python SDK or CLI. Amazon’s SageMaker uses Step Function and Code Pipeline to automate the CI / CD process. Microsoft’s   Azure   provides   two   different   pipelines,   one   for   building   ML   workflows   and another for building CI / CD pipelines. The absence of integrated end-to-end ML platform and the variety of options from different cloud vendors not only demands steep learning curve but porting from one platform to another near to impossible.

Lack of Documentation

Each and every cloud vendor provides detailed overview and steps to follow to build the ML pipelines. There are even articles from renowned data scientists to build end-to-end ML platforms.   As   Artificial   Intelligence   is   evolving   with   constant   improvements   to   the   ML platform,  the  steps  to  build  ML  pipelines  are  often  error  prone.  Some  of  the  ML  platform’s Kubernetes engine still uses the older version of Kubernetes and the recent Kubernetes documentation remains irrelevant. To   add   more   confusion,   there   is   no   documentation supporting  the  version  mismatch.  The  heterogeneity  in  tools  and  technologies  along  with various cloud vendors and open source communities building ML platforms have exponentially  increased  the  number  of  documentation.  The  increase  in  documentation  not only adds steep learning curve but also leads to confusion.

How AIQ powers Machine Learning projects?

Increase Visibility into Machine Learning Projects

 Machine Learning challenges varies with perspective and approach. For instance, management would like better visibility into machine learning projects with faster onboarding of Datascience teams and reduced cost.

Machine Learning Platform

Flexibility to choose any ML Libraries, Frameworks and AutoML

Data Scientists would require automated deployment pipelines which can integrate with the models implemented using any ML libraries, frameworks or AutoML of their choice. Models should be deployed automatically with minimal effort providing inference endpoints for applications to make use of the model.

Machine Learning Platform

Deploy effortlessly with automated CI/ CD pipelines

Data Scientists experiment the models repetitively with different algorithms by tuning the hyperparameters for continuous improvement of model’s accuracy. After experimentation phase, the trained models are deployed to staging environment for evaluation before pushing   to   production.   Ever-changing   data   along   with   the   iterative   nature   of   machine learning projects mandates for an automated CI / CD pipelines wherein any new environment like  staging,  production  are  reproduced  automatically  with  minimal  effort.There  are  readily available CI / CD pipelines from different cloud and machine learning vendors. But changing from one vendor to another demands revamping your entire CI / CD pipelines.

MLOps Deployment

Multi-Cloud Support

Enterprise needs the ability to store models in any cloud or in-house registry, deploy models to any cloud-agnostic environment without having to re-engineer their pipelines. Integrated MLOps should be able to deploy to any cloud, on-prem or hybrid based on the infrastructure of your choice by determining the cost for managing the computing  resource  and  monitoring the performance of your machine learning models. Kubernetes based deployment with reproducible CI / CD pipelines makes it easier not only to onboard any new environment but also onboard new team with machine learning models along with the infrastructure needed to train and run the inference for the model.

Automatic Scaling & Complex Deployments

Deployment pipelines should be capable of provisioning different resource types (CPU, GPU or TPU) with auto-scaling capabilities. It should also assist in complex model deployment scenarios such as A/B deployments, shadow and phased rollouts which have now become a common pattern in data-driven applications, all while providing real-time insights and recommendations.

MLOps Deployment

Beyond Monitoring

End to End MLOps solution necessitates automated monitoring service inspecting model’s health score along with data drift, usage predictions and resource utilization.

Machine Learning Monitor

The performance of any machine learning model is affected by any change in data, code or model’s behavior. For instance let’s consider a machine learning model approving credit application. Previously the model required only FICO score and income, but later enhanced to use customer’s digital footprint expanding the landscape of potential borrowers. This mandates for code change along with retraining the model with new training datasets with additional features. The CI / CD pipelines should be able to automatically detect these  changes, retrain and deploy the trained model with minimal effort. Monitoring should not only capture data drift but also monitor and auto-scale computing resource for better cost management. Machine learning models without diversified datasets tends to be biased. Enterprises with biased models lose their reputation increasing the customer churn rate.

MLOps Solution

AIQ – Manage ML with Confidence

MLOps solution should be able to go beyond deployment and monitoring with the ability to observe and act on insights with self-explainable capabilities justifying why a model behaved in certain manner.

To Learn more about our AIQ tool:

AIQ Workbench

AIQ Deploy

AIQ Monitor

Follow us to learn more about how we increased productivity with reduced cost and effort with our end to end mlops solution.