ML Platforms from different cloud vendors and open-source platforms are proving day by day that Machine Learning is no longer rocket science. The advancement of container platforms like Kubernetes eased down the path for ML platform.
Amazon ML Platform
Amazon SageMaker lets you train the Machine Learning model by creating a notebook instance from the SageMaker console along with proper IAM role and S3 bucket access. One can use an already built-in algorithm or sell algorithms and models in AWS marketplace.
SageMaker lets you deploy the model on Amazon model hosting service with an https endpoint for model inference. The compute resources are automatically scaled based on the data load.
Amazon Model Monitor monitor model for any data drift or anomalies by making use of Amazon CloudWatch Metrics.
Amazon provides modular services for managing ML projects. Using AWS Step function, one can automate the retraining and CI / CD process.
Google ML Platform
Google Tensorflow provide libraries to build Machine Learning models and run in Google AI Platform. Data Scientists can integrate with various Google cloud services like Datalab, Big Query, Data Flow, Data Prep and Data Proc for running Apache Spark and Apache Hadoop.
Google AI Platform has the ability to infer the ML model either using online HTTP prediction or batch prediction API. Google Cloud Logging and Cloud Monitoring service along with ML APIs are used to monitor the model performance in production.
Google AI community have embraced the end to end platform with their TensorFlow Extended (TFX) and Kubeflow pipelines automating ML workflow and CI / CD process.
Azure ML Platform
Azure ML Designer workspace allows Data Scientists to build their workflow by drag-and-drop.
Azure Machine Learning platform lets Data Scientists create separate pipelines for the different phases of ML lifecycle, such as, data pipeline, deploy pipeline, inference pipeline etc.,
Model inference endpoints are created using real-time endpoint service.
ML Flow – Databricks ML Platform
MLFlow provides experimentation tracking, model deployment and model management services to manage the build, deploy and monitor phase of Machine Learning projects.
Databricks have recently introduced Managed MLFlow to manage Machine learning projects end-to-end.
ML Platforms – Compare & Contrast
The improvements in Machine Learning have given rise to AutoML revolutionizing Artificial Intelligence with the ability for anyone with domain knowledge to become a Citizen DataScientist. Variety of ML platforms started with the build phase and eventually have latched onto the deployment and monitoring of ML projects.
Building ML models
Microsoft Azure Studio have the drag-and-drop approach to build ML models hiding the complexities of data engineering and python code. This makes it easier for Citizen DataScientists to build ML models with limited programming knowledge.
Amazon SageMaker lets users build ML models using Notebook and Python code. SageMaker offers more flexibility to customize with the python libraries with their built-in algorithms while Azure Studio offers the simplicity needed for any one to become a Citizen DataScientist.
Google Machine Learning offers both the flexibility to build custom ML models along with the speed of innovation with its AutoML and performant heavy ML APIs.
MLFlow’s Experimentation Tracking provides the ability to experiment models with different algorithms, hyper-parameter tuning and feature selection. MLFlow supports multiple programming languages like R, Python, Java, Scala and can be deployed to any cloud platform unlike other cloud vendor Machine Learning Platform.
Every vendor have strived to automate building Machine Learning models. But the difference in implementing makes it harder to migrate between ML platforms.
There should be freedom of choice with interoperability among different ML platforms.
AutoML refers to Automated Machine Learning wherein raw datasets are converted into deployable models predicting and solving real world problems.
Each and every cloud vendors and open source community have embraced AutoML. Google has adopted Anthos ultimately providing the flexibility to deploy to any cloud.
AutoML is still evolving and needs more focus on accuracy and explainability letting any enterprises break the barrier to venture into Machine Learning for their business growth.
Azure ML with their drag-and-drop provides the simplicity while Google, SageMaker and MLFlow provides the flexibility to customize Machine Learning models. The lack of interoperability to mix-and-match any cloud vendor’s Machine Learning services burden the enterprises adopting specific ML platform. Every vendor provides Basic and Enterprise version and the cost and services offered varies based on the selection. As the enterprises advance in Machine Learning, the growing number of datasets demands increase in pricing with respect to processing and storage capacity.
The lack of relevant documentation and training along with the increasing cost to manage Machine Learning projects poses a threat to enterprises leading to less number of ML models in production.
Variation in ML Deployments
In order to keep up with the ever changing data, there is a dire need for continuous retraining eliminating the time consuming manual retraining of Machine Learning models. Similarly there is a need for continuous integration and deployment pipelines.
Azure has both workflow pipeline automating the building process and CI / CD pipeline for automating the deployment pipelines. Databricks MLFlow via its Python package manages the model deployment while Google’s KubeFlow uses a set of open-source ML libraries to be run on Kubernetes clusters. While there is an option to create KubeFlow pipeline on Azure, Azure pipelines can be deployed to any cloud. AWS offer different ML services like AWS CodeCommit, AWS Deploy, AWS CodePipeline to automate the deployment of ML models to production.
The deployment strategies adopted by different cloud vendors vary with respect to performance and ease of use. With the continuous evolution in ML deployments, there are frequent changes making it harder to learn and adopt.
Variation in Model Monitoring
The iterative nature of Machine Learning projects for building effective ML models coupled with the necessity to maintain model performance and accuracy with the ever-changing datasets utter the need for continuous monitoring.
MLFlow Tracking lets users to log parameters and metrics for monitoring and visualising the model’s performance.
Azure Monitor not only monitors ML data and resources running on Azure but also resources consumed by ML models running in other clouds. Azure Monitor alerts on deployment failure, resource utilization, unusable nodes etc.,
SageMaker Model Monitor enables monitoring the model’s endpoint and automates monitoring with its monitoring schedulers. SageMaker SDK monitors the fraction of dataset sent to the model’s endpoint and alerts on any data drift affecting model’s performance.
Google’s AI Platform with its API along with Cloud Logging and Cloud Monitor service examines the running ML models for any performance degradation.
Monitoring Machine Learning models should not be an after-thought rather seamlessly integrated with the Machine Learning lifecycle. Visibility into models runtime behaviour not only benefits Data Scientists and Business to monitor model’s effectiveness but also provides opportunity to learn any shortcomings and embrace social inclusiveness.
Amazon Notebook, S3,
Amazon Hosting Service
Amazon Model Monitor,
Amazon CloudWatch Metrics
Google AI Platform,
Datalab, Big Query, Data Flow,
Data Proc for Apache Spark and Hadoop
Tensorflow Extended (TFX),
ML Monitor API,
Google Cloud Logging,
Cloud Monitoring service
Azure ML Studio,
Real Time Endpoint Service
Part 1 – End to End ML Platform! Are we there yet?
Part 2 – End to End ML Platform! Are we there yet?
Part 3 – End to End ML Platform! Are we there yet?
Stay Tuned for Part 5 – End to End ML Platform! Are we there yet?