Data Science: Buyer Beware
I don’t want no iceman
I’m gonna get me a Frigidaire …
I don’t want nobody
Who’s always hangin’ around
– Louis Jordan, “I’m Gonna Move to the Outskirts of Town”, 1941
Any field of study followed by the word “science”, so goes the old wheeze, is not really a science, including computer science, climate science, police science, and investment science. And then there is the saying, “when *** is used to pitch something besides ***, someone is trying to get in your back pocket rather than the front.” If both of these are true, then Thomas Davenport and D.J. Patil’s over-the-top declaration that the data scientist is the sexiest job of the 21st century deserves a double dose of skepticism.
Indeed the skepticism is justified. Data science has much more in common with management fads than science, by its ordaining practitioners of obscure technical specialties with instant guru status, pitting them against the ignorant masses, and infusing the latter with itching uncertainty. More fundamentally, the bluesmen of prewar United States who wrote tunes like “I’m Gonna Move To the Outskirts of Town” were right be wary of a technology that caused their families and lovers to be dependent on persons coming regularly to the house to deliver necessary goods, whom bluesmen worried would take advantage of women at home alone.
Remember the fad that forgot people? It’s back!
Data science has not just emerged out of the blue, but rather is the fresh-faced third generation offspring of the 1990’s management fad Business Process Reengineering (BPR). The reader might recall Davenport as one of the captains of BPR, which true to its rhetoric of “Don’t Automate, Obliterate” became an ignominiously destructive management fad. BPR’s effects were so pernicious that its three main proponents, including Davenport, issued public apologies, which consisted mainly of blame shifting, usually to vendors, consultants, and errant management gurus, while maintaining that BPR was a good idea that unfortunately fell into bad hands.
In contrast to other management ideas of the day, BPR was charmingly simple. Yet when implemented, BPR ended up producing the opposite, requiring enormous amounts of IT investment, bureaucratic overhead, and technical specialization in order to achieve even simple results. All too frequently such results included downsizing by the thousands, with few survivors left to deal with even greater complexity, brought about by redesigned yet overengineered business processes. Like the gruesome medical practice of bloodletting, BPR left many businesses sicker than before, experiencing a 70 percent fail rate at the time of its height. To this date there is conflicting evidence as to whether BPR is truly cost-beneficial.
BPR’s demise left behind a lot of data and excess IT capacity, along with a sense of guilt over mismanagement of IT investments, giving birth to the field of knowledge management. During the next decade, knowledge management lived a modest life, supporting IT professionals wanting to sweep up all that data and store it, and management consultants trying to help companies turn complex processes into competitive advantage.
Data science is the spry third generation of BPR, responding to vastly increasing IT capacity, unprecedented ability of businesses to create data, widespread realization that data is a valuable resource, and the burdensome need to extract data from storage in order to realize business value. Yet, data science belongs to a family tree of business practices that for over a century have been governed by technocrats who view organizations as machines, desiring to automate everything and eliminate people wherever possible. Data science is shaping up to be a redux of its grandfather BPR, with the same structural features (BPR was never really engineering, nor as we shall see is data science really science), and its propensity for sin and indulgence.
No science please, we’re skittish
Davenport and Patil declare that “Data scientists’ most basic, universal skill is the ability to write code.” With this pronouncement, data science fails the smell test at the very outset. For how many legitimate scientific fields is coding the most fundamental skill? The most fundamental skill for any scientist is of course mastery of a canonical body of knowledge that includes laws, definitions, postulates, theorems, proofs, and descriptions of unsolved problems. Scientists are therefore characterized by mastery of a body of knowledge, not a collection of methods. What is this body of knowledge for data science? Davenport and Patil admit there is none.
The job of scientists is to conduct independent research, contribute to a body of knowledge, and improve professional practice, while adhering to a recognized standard of conduct. Coding is a tool that facilitates some of these objectives, but is a substitute for none of them. Lacking a definitive course of study to assure minimum competency, or a professional society to check conduct, data scientists are classified properly as faddists rather than scientists.
The principle of parsimony leads scientists to favor the theory that explains the most with the least amount of elaboration, that is, to simplify as much as possible. Coding does not simplify, but rather translates, abstracts, and sequentializes, often giving a false sense of concreteness to concepts that are poorly understood or articulated. Consequently, data science confuses the tool and the result, and the spurious science of data is confused with authentic science (an “-ology”) that drives business behavior.
That is not to deny coding is valuable if not crucial for persons conducting scientific inquiry, especially about business topics. Like many readers, much of my academic training and business career has involved demanding quantitative work, including merging databases, extensive data cleansing, giving dimensions to flat data, creating new variables, and performing analyses using numerous unconventional statistical methods. Coding certainly facilitated each of these steps. But invariably, the most valuable tool was my knowledge of the data and underlying phenomena I was studying, not coding. Scientists failing to master the former fool no one but themselves. Faddists mastering only the latter fool everyone, including themselves.
An economy of counterfeit goods
Businesses that adopted BPR were not stupid, though their opaque bureaucracies often made them feel that way. Part of the massive appeal of BPR was its approach of simplicity: begin with a blank sheet of paper, rethink key business processes, and then reduce them to as few steps as possible.
Indeed business transformation should strive for clarity and promote effective communication. It should behave similarly to a well-functioning market, with changes driven organically as knowledge is discovered and teams form around value-creating processes. It should not be dependent, like most management fads, on top-down, artificial organization changes, presided over by self-defined experts and gurus posturing themselves as the only ones capable of dealing with complex organization mechanisms.
As BPR morphed into knowledge management, the virtue of simplicity was reversed, and complexity came to indicate merit. Data science promises to deliver value by unpacking some of that complexity. Yet like the two fads that preceded it, data science tries to create value through an economy of counterfeits:
- False expertise, arising as persons recognized as experts are conversant in methods and tools, and not the underlying business phenomena, thereby relegating subject matter knowledge below methodological knowledge,
- False elites, arising as persons are summarily promoted to high status (viz., “scientist”) without duly earning it or having prerequisite experiences or knowledge: functionaries become elevated to experts, and experts are regarded as gurus,
- False roles, arising as gatekeepers and bureaucrats emerge in order to manage numerous newly created administrative processes associated with data science activities, yet whose contributions to core value, efficiency, or effectiveness are questionable,
- False scarcity, arising as leaders and influencers define the data scientist role so narrowly as to consist of extremely rare, almost implausible combinations of skills, thereby assuring permanent scarcity and consequent overpricing of skills.
For many businesses, the data most likely to yield valuable insight may not even be contained in databases, but rather shabbily maintained spreadsheets and text files, distributed across multiple systems, and lacking a codebook. Such data may not even be intelligible without context that is available only in the tacit knowledge of employees or the culture of the organizations. Those who manage under such conditions ought to reflect very carefully: should they trust counterfeit solutions to produce better analytics results than authentic experts who understand the deep psychological, sociological, and economic foundations of business behavior?
Nothing should come between you and your data
Real science discovers universal principles such as the gas laws, which yield many useful technologies, including refrigeration. Yet refrigeration creates value only when it is consumerized, not when it is hoarded. A refrigerator in every house is a sign of economic progression; an iceman delivering ice every day is a sign of economic retrogression.
People needed a Frigidaire in their kitchens, not dependence on icemen to come to the house every day, which the bluesmen of almost a century ago rightly identified as trouble. They were right to purchase technology that made the household self-sufficient and improved their family’s quality of life.
Analytics technology also belongs inside the house, making users independent consumers, and not requiring dubious experts to supervise a technology monopolization that creates value for mostly themselves, through false scarcity and fabricated expertise.
Rather than seeking out gurus to mollify big data anxieties, analytics users should demand that their vendors produce tools that can be used primarily by subject matter experts, in collaboration with analytics specialists, providing transparency and an appropriate level of functionality to both, and facilitating collaboration among business users.
Analytics has the potential to transform business like no technology that came before it. But if left to the sort of data science that Davenport and Patil describe, it will pursue the same life of debauchery as its grandfather BPR, becoming yet another business fad that forgets people, and probably just as destructive.