Are ML Platforms Collaborative Enough?


Machine Learning Projects are highly iterative and involve full-fledge collaborative efforts from data scientists, ML engineers and  management team working towards common goals.

End to End in Machine Learning Platform varies with perspective. For some, end to end in ML starts from data engineering to data science followed by operationalizing ML models. But it can also be viewed as managing ML projects end-to-end with clear visibility for all stakeholders including business. With the current available pipelines, Data Scientists can build ML workflows by automating the experimentation phase to find the right model. At the same time, there are deployment pipelines automating the CI / CD flows and operationalizing the ML models. We are in the right direction with pipelines automating different processes of ML projects. The lack of unified experience along with the need for manual collaboration among Data Scientists, ML Engineers and management team proves that ML platform is still evolving and needs time to grow and mature.


Does E2E ML Platform make you worry about vendor lock-in?


With End-to-End ML platforms still maturing, different vendors are building tools and services to ease the management of Machine Learning projects. 

Google launched AI platform around April 2019, an end-to-end  platform to build and manage ML projects. Data Scientists can use either Google’s Big Query or Google Cloud Storage to store the training data and can use their AutoML for model training. Kubeflow pipelines assist in building ML workflows that can be shared and deployed on GCP or on-premise.

Amazon offers ML Services like SageMaker, Ground Truth, Notebooks, Algorithms, Neo, Deployment and Hosting  assisting Data Scientists in data labelling, pre-built algorithms and notebooks along with one-click training and deployment.

Microsoft released Azure Machine Learning Studio around 2015 and Azure Machine Learning Services in 2018. While ML Studio enabled building model by drag-and-drop, ML Services offered much more rich experience with AutoML, GPU Support, hyper-parameter tuning and auto scaling kubernetes cluster based on the load.

While ML platform from Google, Amazon, Microsoft can run only on their own cloud or on-premise, companies like Databricks offer MLFlow which can run on any cloud. But experiment logging is not automatic enough with limited set to log parameters and metrics for TensorFlow and Keras. Remote execution is limited to Azure Databricks, Databricks on AWS and Kubernetes (experimental). This feature is difficult to use if you are not on Databricks. Programming language and machine learning library related dependencies are captured in configuration files to make the experiments reproducible. However, infra related dependencies such as CPU/GPU/TPU and memory requirements are not included requiring external documentation and handoff procedures to engineers to ensure all dependencies are met.

Porting from one ML platform to another is a tedious task as it involves building ML platforms from ground up along with a steep learning curve. Vendor lock-in is real and it restricts enterprises to adopt multi-cloud strategy and deprives from using products and services from hybrid vendors.


Is it economical enough to use the E2E ML Platform?


Cloud computing have significantly reduced the operational cost ranging in cents per hour. At the same time, the cost increases with an increase in data load and processing units. Simple Machine Learning model tends to use small GPU units, but as the enterprise increases its foothold in AI with more and more deep learning models, the operational cost increases by 10-fold. Machine Learning models deployed on-premise are much more affordable. 

Today cloud giants are pushing their AI tools and services along with the AI infrastructure locking in customers. More and more startup and open-source communities are building affordable AI platforms and services. The next phase of AI is arriving to offer affordability and flexibility in choosing the services and tools that fit their needs rather than being locked in by cloud vendor offerings.


Are ML Platforms sensitive to bias?


Machine Learning Platform manages end-to-end life cycle of ML projects starting from data engineering to building and managing ML models in production. Different ML platforms offer AutoML automating the ML workflow pipeline and CI / CD pipeline automating the deployment process. At the same time, bias monitoring is an important aspect for business to grow ethically.

Around August 2019, it was identified that Google’s AI-based hate speech detector  was biased towards black people. Similarly Amazon’s Rekognition software had more false positives in facial recognition while identifying people of color. Amazon discarded their AI-based recruiting system where it automatically picked more male candidates. In 2017, a palestinian man was falsely arrested for his Facebook posting which read “good morning”, but Facebook translated it to “attack them”. The revelation of bias in Artificial Intelligence not only cost enterprises but also tarnish its reputation.

DIfferent cloud vendors and open source Machine Learning community are building technologies to eliminate discrimination and bias. For instance, Google adopts a technology called TCAV (Testing with Concept Activation Vectors) that allows Data Scientists to interpret and fix model predictions eliminating bias. TCAV helps in eliminating biased assumptions that all doctors are male even when the training data are heavily concentrated with male doctors.

The openly available End-to-End ML platform concentrates on easing the build and deployment of ML models along with model performance monitoring. We are still in dire need of an integrated ML platform where Data Scientists can recognize and interpret any bias and automatically retrain model with less biased data. 


Does ML Platform require Steep Learning Curve?


Different cloud vendors use their own tools and technologies to build ML platforms. Moreover, the understanding of End-to-End varies as some platforms have streamlined build and deployment of ML models while others concentrate much on data engineering automating the ML build process.  

The commonality among the cloud vendors’ ML platform is that they are built on top of Kubernetes clusters. But the commonality ends there. Google’s Kubeflow Pipeline provides the flexibility in building ML pipelines either from Jupyter Notebook or from existing Kubernetes cluster using Python SDK or CLI. Amazon’s SageMaker uses Step Function and Code Pipeline to automate the CI / CD process. Microsoft’s Azure provides two different pipelines, one for building ML workflows and another for building CI / CD pipelines. 

The absence of integrated end-to-end ML platform and the variety of options from different cloud vendors not only demands steep learning curve but porting from one platform to another near to impossible. 


Documented Well? Easy to Follow?


Each and every cloud vendor provides detailed overview and steps to follow to build the ML pipelines. There are even articles from renowned data scientists to build end-to-end ML platforms. As Artificial Intelligence is evolving with constant improvements to the ML platform, the steps to build ML pipelines are often error prone. 

Some of the ML platform’s Kubernetes engine still uses the older version of Kubernetes and the recent Kubernetes documentation remains irrelevant. To add more confusion, there is no documentation supporting the version mismatch. 

The heterogeneity in tools and technologies along with various cloud vendors and open source communities building ML platforms have exponentially increased the number of documentation. The increase in documentation not only adds steep learning curve but also leads to confusion.


Many vendors including cloud and open source communities are building competitive End-to-End ML platforms. Enterprises have to be wary of the consequences  in choosing the ML platform. Machine Learning Platforms should not only be cost effective but also streamlined with proper role based access enabling more accessibility and collaboration. 

The available ML platforms are not fully automated as the process to build ML workflows and CI / CD pipelines are often disjointed. There is keen attention in the ML platform arena to automate the build process with single click deployment. Model Monitoring is still evolving with performance monitoring. ML Monitoring will be full-fledged only with the presence of an integrated monitoring system monitoring not only model performance and resource consumption but also alerts on data drift and bias.

As Artificial Intelligence increases its presence, it is not only enough to build scalable Machine Learning models but also create and maintain inclusive communities.

Part 1 – End to End ML Platform! Are we there yet?




Part 2 – End to End ML Platform! Are we there yet?




Part 3 – End to End ML Platform! Are we there yet?




Part 4 – End to End ML Platform! Are we there yet?