Machine Learning Thursdays: When a Cow Laughs, Does Milk Come Out of Its Nose?
As flu season approaches and tissue and cold medicine sales begin to increase, our thoughts don’t turn to the often-overlooked part of “Science” in “Data Science.” But maybe our thoughts should turn in this direction—because the Internet of Everything begins with questioning the relevance, productivity, and efficiency of “Everything.”
The core of “Science” is that ideas are tested by experiment, everything else is bookkeeping.
Real value is created or influenced by the combination of decision science with business value. There was this book that was written a few years ago that detailed sandwiches, cats, copiers, adversity, and successes along the road of life. Hidden in the pages of this book was a concise list of six questions to evaluate business value:
This Should Be Easy. Don’t You Think?
Many ideas may provide value when tested, but when evaluated over time they may not produce the business value result—such was the case of ‘Google Flu Trends.’ Google Flu Trends began in 2008 and had a simple business goal: to predict flu activity spread (low, moderate, high, or intense) by aggregating Google search queries.
To reveal if there was the presence of flu-like illness in a geographic population, the model scored against influenza-like illness data obtained from U.S. Centers for Disease Control and Prevention (CDC). The resulting tests and implementation showed initial promising results. However, in 2013 Google Flu Trends missed the target by over 140%, which contributed to the end of the program.
The Bigger the Data, the Bigger the Mistakes? Maybe, Maybe Not.
There are many reasons why this project didn’t yield the desired results, but I believe that there were three main contributors. And many other projects are just as susceptible:
- A data source must be judged on the value it provides. Bigger Data doesn’t always lead to a better model and in many instances lead to pattern matching at the expense of correctly identifying causation predictors.
- Models grow weaker as time moves forward. Robustness of a model is important, and being able to quantify the degradation of the model through time is fundamental in compensation for variable change.
- Data science without business insight leads to exercises for bookkeeping and not business value. There is very rarely a ‘tell me something I don’t know” answer in data science. More often, the realization of a different perspective is uncovered from an undervalued correlation value that could be discounted by human fear, bias, familiarity, or lack of control.
Let’s consider these three reasons in the context of the initial question: “When a cow laughs, does milk come out of its nose?” The answer is no; because cows do not laugh. Only humans, apes, and rats are known to laugh. If you’re thinking that you have never heard a rat laugh, then you are correct. This is a built-in bias which we all have because the frequency of the laugh of a rat is beyond the auditory range of human hearing.
Our evaluation of the tested experiment must be considered from different perspectives that may be intentional, misunderstood, or not experienced.
The Value of Automated Analytics
SAP automated predictive capabilities may not be able to solve every “Science” causation test, but it’s an extremely valuable set of tools to test and evaluate ideas that lead to targeted business questions and courses of direction for business decision analytics. Business decision analytics are simply the levers, direction, speed, and resources you are willing to engage or disengage. SAP Predictive Factory and SAP Predictive Analytics Integrator can solve many of these business decision analytics questions and productionize them easily across multiple products, people, geographies, and processes.
The Beginning Questions
In the context of McDermott’s six business value questions:
- Have you identified the lever to enact your decision?
- Have you disparately formulated your test?
- Have you objectively quantified the results?
The goal is not to predict anything. The goal is to use the levers to produce the outcome which you want to happen. To put this in perspective, let’s use the retention of valuable employees. Why should we care about the 10 reasons why a valuable employee may choose to leave? Shouldn’t we focus on the 10 levers that we can pull to encourage them to choose to stay?
To learn more about this subject, see:
- All the Thursday series posts for more on machine learning, predictive analytics, and artificial intelligence.
- The predictive Forrester Wave report and the predictive analytics TDWI paper, Machine Learning for Business: Eight Best Practices for Getting Started
- The IDC paper, The Value of Analytics in Digital Transformation