Data is powering this century. There is an abundance of data coming from the digitized world, IoT devices, voice assistants like Alexa & Siri, fitness tracker, medical sensors to name a few. Data Science is becoming the center of growth hustling sectors like healthcare, logistics, customer service, banking, finance., etc. AI and Machine Learning are now mainstays in boardroom conversations and with this data-centricity also comes the big question around governance and ethics in data science.

Are we ethically responsible for handling data?

Everyone is responsible for handling data with utmost care. Bypassing ethical data science just for monetary gain fosters bias and stereotyping. Similarly, cross-validating real-world data against the biased data results in an inconsiderate business decision reducing not just monetary gain but most importantly its reputation and customer loyalty. Every enterprise is responsible to grow its business by cultivating togetherness among communities by being more inclusive and filtering out any unconscious bias.

What are the effects of unethical data science?

Data Privacy is becoming a major concern with more and more machine learning models learning our digital footprint and predicting our future necessities whether we like it or not. Legislations like GDPR(Europe), Personal Data Protection Act(India), California Consumer Privacy Act(CCPA) stresses the importance of data privacy, protecting digital citizens from dangerous consequences of misused data.

Micro-targeting based on consumer data and demographics is influencing the action of the targeted consumer segments. With abundance data, it is becoming harder and harder to differentiate truth from falsehood. Micro-targeting without the proper understanding of data and its source leads to more harm than good.

Healthcare prediction failures, like the IBM Watson, leads to irreversible consequences. Right now,  the healthcare industry is undergoing a major revolution with Artificial Intelligence. The success of AI in healthcare depends on a one-team approach with transparent discussion from a diverse set of leaders from both healthcare and data science.

Facial Recognition Softwares are known to falsely classifying people with criminal intent based on one’s skin color as the ML models are trained with predominantly white faces.  Multiple facial recognition applications are available in the market. But the success of the application depends on the diverse set of data used in training the facial recognition models.

How to handle data ethically?

1. Understand Bias Types

It is very crucial to understand the different bias types and be conscious of its existence to handle data ethically. Bias in Machine Learning can be classified into Sample, Prejudice, Measurement, Algorithm, and,  Exclusion Bias

Sample Bias

Sample Bias arises from misinformed information where training data contains either partial information or incorrect information. For instance, predicting the spending activity of a customer based on their social feeds and not from relevant payment platforms leads to sample bias.

Prejudice Bias

“Our environment, the world in which we live and work is a mirror of our attitudes and expectations –Earl Nightingale

Being prejudice with preconceived opinion cause more harm not just to the business, but also to the society and well-being of our future mankind. It takes immense strength to acknowledge and eradicate any unconscious bias. 

Exclusion Bias

Everyone is unique with their own abilities and strength. Just because some of us do not follow the norm, are by no means subjectable to exclusion. Each one of us has our own unique qualities to contribute. Enterprises not adopting inclusive policies will be out-of-market in a short time. 

Algorithm Bias

Machines do not understand bias. The erroneous assumptions often made when selecting the datasets and algorithms either consciously or unconsciously, lead to algorithm bias.

Measurement Bias

Measurement Bias usually happens when a model favors certain outcomes over others. A model predicting the sales target of consumer products that will double in next quarter based on past sales history will favor items whose prices were marked low over others.

2. Foster Togetherness

Eliminating Bias is not a one-time activity, rather a continuous process. Bias elimination starts from selecting the right algorithm and setting the data governance team with all the members involved in the ML project lifecycle including the business team, data scientists and MLOps team.

Models are less prejudiced if the test datasets are from the real-world rather than from the sample set. Real-world data also offers the advantage of being diverse and inclusive in nature as the data is from real customers. But at the same time, including data from active customers alone will not solve the inclusion problem. Such unconscious bias can be detected by having Human-in-the-loop along with continuous monitoring. 

Summary

With data growing exponentially and legislations controlling the data usage, it becomes crucial to exercise the data consumption for common goodness. Fostering togetherness by collaborating with people from different sectors, being socially responsible and accountable for ethically using data will become the foundation for the successful AI revolution.