Being a new introduction of the past decade, big data + data science is a bit of conundrum to organizations as HR departments need to put together new hiring processes or update older ones; privacy and security teams need to implement access frameworks across diverse infrastructures; data management and operations need to deal with the mammoth in the room (cloud in this case!); engineers and managers have to adapt their software engineering processes (agile data science, anyone?) while rest of the operations need to figure out if big data + data science is a cost centre or business unit! It is no wonder that the term data scientist still remains as enigmatic and all-encompassing as it was a few years ago when it was first coined.
However, when you step back and look at some of the established roles in software industry, we define them by their function ahead of any skill-sets. We define a software engineer as one who designs and builds software and not by the IDE or the programming language he/she uses. Similarly a product manager is defined as a person who sketches roadmaps and oversees execution of a product/feature, but we don’t differentiate them by the tools they use. In stark contrast, from inception of the role, we have focused much on the tools (R vs. Excel vs. Python), degrees (PhD vs. Masters) and topics (Statistics vs. Data Mining vs Machine learning) and less on the functional expectation of the role. I expect this to change with growing maturity of the field, and successful integration of big data and data science.
Looking back at my own experience and also being a close observer of several data science teams at other organizations, it is safe to say that at any given instance, a data scientist, more often than not, is crunching big-data and is expected to work on one of the following –
A large portion of a data scientist’s role is being an analyst, or rather an analyst 2.0. In analytics, typically the business has a well defined problem (e.g churn, acquisition, user segmentation etc) and is looking for scientific models to often understand correlations and/or causality (good luck with that!). The advanced part is where a data scientist pushes these models to leverage big-data to not only detect but also predict a future state with some certainty (or uncertainty). This is where the business comes looking for answers to their questions !
Data scientists are data miners at heart, and a true data miner often wants to go off on a data expedition and discover unknown insights rather than proving or disproving existing hypotheses. Now, insights are not as interesting if they can not be acted upon. For example discovering that teens age 18-24 use your product is an insight, but its only actionable if we can also discover the right incentive that gets them to sign twice as fast. This is where a data scientist alerts the business and raises questions !
Data Products and Services
Finally, this is the part of a data scientist’s job where he doubles as an engineer. The role requires defining new products around data and steering the existing product towards becoming more intelligent and data-aware so as to provide seamless experience to the users. This is where a data scientist starts to define the business!