What Data Science Can Do To Become A Classic
When I wrote “Data Science: Buyer Beware,” I was certainly not expecting a spirited, standing ovation, as would follow a Scriabin performance by Vladimir Horowitz. Despite presenting a sharply contrarian view, I nevertheless expected to be largely ignored, with potential readers favoring articles about various gadgets, 3D printing, the deconstruction of Silicon Valley celebrities, or what the recently concluded Consumer Electronics Show augurs for civilization.
Not surprisingly, the disaffection of those in the data science field quickly lit up the blog space and Twitter. Most appeared inclined to react; but as press time approached, few seemed to consider the message carefully enough to craft the sort of rejoinder they would present to their peers. That was disappointing, as the phenomenon of big data is a discussion we should all be taking very seriously, even if it includes opinions that on the surface appear abhorrent.
Yet there was a bright spot. A response by data scientist Melinda Thielbar stood out by its reflectiveness and style. In the interest of giving credit where it was due, I tweeted Melinda’s entry, which initiated a Twitter conversation between the two of us, and a subsequent hashtag started by Melinda, #FutureOfDS. During our Twitter exchange I recalled the 2004 Business Horizons article, “How to detect a management fad – and distinguish it from a classic,” and performed a thought experiment, what would data science need to do to become a classic?
A lot of thoughts emerged from this experiment.
The make or break time is surprisingly short.
Management fashion (an academic term used neutrally, implying no judgment as to merit or durability) typically follows life cycles, with the distinction of fad or classic being determinable approximately five years after the fashion has gained momentum. If this characteristic is true, and assuming late 2011 as roughly the time when substantial momentum had begun, then the next three years is a make or break time for data science. Since there is often a 12-18 month lag between widespread recognition of a need and full staffing, many new roles will be created relating to data science, and a whole ecosystem of users and products will emerge.
What will likely make data science? Keen responsiveness to user needs, adoption by a wide range of users, a clear cost-benefit proposition, and demonstration of authentic organizational change. What will break it? Gurus and academics with hyperbolic declarations, widespread circulation of case studies with outlier results, excessive rhetoric, superficial implementations, and as I argued earlier, monopolization of technology.
Contrary to what one might readily conclude, I don’t want to see data science disappear as a fad. Rather, two quotes by the late media theorist Marshall McLuhan sum up my opinion:
In the age of information, the moving of information is by many times, the largest business in the world.
From now on the source of food, wealth and life itself will be information.
In other words, data is truly a unique economic resource that up to now we have been able to industrialize only partially. Data science has the potential to create entirely new sources of wealth and social benefits, but cannot do so if it remains a pre-paradigmatic science or becomes a paraprofessional practice.
In that spirit, here are some of the things that data science can do to maximize the chance that it will be carried into being a classic.
Science is radical. Fads are novel. Be radical.
Data science methods and techniques contain unprecedented potential for capturing business behavior, and much effort is now being devoted to exploring what methods and techniques best apply. At present, business behavior is attributed to economic, sociological, and psychological phenomena. For data science to endure as an authentic science it has to be radical. It has to be more than a collection of novel techniques or business innovations, no matter how scientifically based both are. Data science has to demonstrate that data also drives business behavior in a predictable manner.
Right now I do not believe that has been established. But if data science can do so, to the point that within the space of a few meetings with senior managers, data scientists can describe the most likely ways a company’s data can create business value, then the data scientist really will be the sexiest job of the 21st century. And it won’t be gurus saying it.
Get experts in the social and behavioral sciences on your side, and into your practice.
Future business analytics projects will almost certainly require interdisciplinary teams in order to be successful, including persons trained in traditionally softer sciences. As many data scientists have been trained in hard sciences, suspicion or even contempt of social or behavioral science fields based on preconceptions in the academic hierarchy may still be present. Data science should fly above such invidiousness.
There are dozens of cognitive and social scientists at the 99th percentile of quantitative ability, and who know a lot of useful methods and techniques unfamiliar to physicists, statisticians, and computer scientists. Additionally, behavioral scientists routinely deal with messy data, and frequently have firsthand experiences with survey construction. They can offer invaluable advice in correcting errors in datasets, proceeding from their field experiences collecting data in sketchy nightclubs or on commuter train platforms in subzero temperatures, for example.
A lesson to be learned from many of the workflow improvement fads is not to neglect the fact that technological advancement is always surrounded by an equally complex social system. How individuals perceive and interact with workplace technology affects business performance along with the processes that technology enables. Ignoring the human contribution is simply inviting disaster.
Previously, I referred ironically to Thomas Davenport’s 1995 Fast Company article, “The Fad That Forgot People,” pointing out the regrettable results of information technology that fails its users. This time, I frankly urge everyone working in big data and analytics to read this excellent piece, and to heed Davenport’s timeless wisdom, “Talk softly about what you’re doing and carry a big ruler to measure real results.”
Create a big space for lots of users, establish a currency, and make commitment valuable.
Data science is equipping itself to address some the most complex problems ever faced in business, and therefore data scientists will need to become fully immersed in the organizations they serve. Although requiring a rare set of extremely cultivated skills, data scientists should not exclude potential consumers of data science results as collaborators. Rather, it is to data scientists’ advantage to promote a robust, intelligent, and informed consumer base, one that creates many users, some of whom can advance to the level of a prosumer.
Fads usually attempt to address complex organizational problems simplistically, facilitating superficial implementations so that everyone can feel confident of buying in. To move in the direction of becoming a classic, data scientists need to create a big community without diluting the practice. One way is to become conversant in the subject matter of data science consumers, perhaps even obtaining basic certification in the subject matter area of the consumer.
Another way is to bring consumers closer to data science. Earlier, I criticized Davenport and Patil’s statement that declared coding as the fundamental skill of data science. Yet, power users almost always want to get to the lowest level their tools will allow, which usually involves some form of coding. While coding may not ultimately distinguish data science from other fields, it can provide the universal currency for data science stakeholders. Data science can create value by getting tools such as R and Hadoop into common parlance, making coding something that subject matter experts desire to learn in order to communicate effectively with data scientist colleagues. More broadly, data science should also consider the degree to which open sourcing would increase both the amount and knowledge of its consumers.
Start out by being classy. Sexy will follow.
Being smart carries a lot of cachet in 2013, but be careful about overplaying a good hand. Everyone knows data scientists are good. Wear it well. Do things that improve how we all think. Demonstrating that you are smarter than everyone else is not sexy. Making people smarter for knowing you is drop dead sexy.
The degree identifies you a scientist. The pedigree validates what you do as science.
Data science has set very high expectations by assuming the word science. Yet when applied to business, data science needs to be ever vigilant to avoid the excesses of scientific management, sometimes referred to as Taylorism.
In my previous entry, I argued that data science is the third generation of Business Process Reengineering, and belongs to the tradition of Taylorism. Data science needs to refute this argument resoundingly, and have a ready way to address residual Taylorist suspicions. Data science can achieve both by demonstrating that, rather than descending from a family tree of scientific management, it is the legitimate child of big science, an enterprise whose origins are Viète and Pascal, and which has proceeded uninterruptedly through, Bayes, Markov, Schrödinger, Einsten, Heisenberg, Terman, Hotelling, Shannon, Kolmogorov, Oppenheimer, von Neumann, Ulam, and Simon.
Big science mastered the transfer of technology from institutions to markets, facilitating the development of numerous wealth-creating industries, and countless improvements to quality of life. Its legacy is the convergence of theory and practice, whereby scientists are also practitioners, and knowledge accumulation is embedded in human artifacts, such as aircraft, satellites, supercomputers, and lasers, rather than academic journals. Thus one could argue that big science prefigured data science. Yet data science emerged from the decline of the big science enterprise, as scientific talent opted for employment in finance, consulting, and industry, and powerful computing resources became available outside government laboratories.
Show the world the radiant side of being very intelligent.
Every brilliant person I have ever met has an internal world of very intense, vivid experiences. Find some way to communicate those experiences in data scientific work. That is the art that shadows science, and it will help make data scientific work endure long after the practical results are consumed. Develop a data science style that is unique to the content data science regularly conveys. Set new standards for beauty, elegance, and excellence in coding, structuring, modeling, analyzing, visualizing, and communicating.
Yours to lose.
The statement, “data is the new oil” has become something of a cliché, but there is an important message: big data is a big resource. Even more, data is an economic good like no other. We can create more of it, and often there are increasing returns to scale. It is a nonrival good, and increasingly accessible by firms with limited resources. Data contains unfathomed ways of creating prosperity, and data science needs to be a leader in insuring that this resource is subject to a fairly functioning economy.
The big message I tried to convey in my previous entry is there is an entire economy developing around how data is accumulated and consumed. Data science has the duty to manage this resource so that it is not subject to coercive monopolies or manipulation by the opportunistic and unscrupulous. If after 20 years, data science amounts to nothing more than a branch of economics where data is treated as the core resource, it will still be one of the significant innovations of our lifetime.
It’s yours to lose. But let the economy of it all become derelict, and you will lose it. And we will all lose.