When Analytics Fail

Image provided by: ETF Prodigy

After hearing the redundancy of my title, Analyst on OpenView’s Research and Analytics team, most people naturally assume I love numbers. Often I tell them I do. Numbers boil down an infinitely complex world into a manageable format that can be manipulated, modeled, and mined for insight. Accurate numbers, properly used, can teach you a ton about how the world works, and in turn, the adjustments you make based on that analysis can affect the world in tangible and profound ways.

But numbers can’t answer everything, and if you follow them blindly, you can be badly misled.

Too many executives, whether “numbers people” or not, see statistical analysis as a necessary stamp of approval to any business decision.  But the simple truth is that there isn’t a mathematical answer to every question, and if you force one using data that doesn’t accurately describe the real world, no amount of color-coded charts, quadratic regressions, and chi-squared tests will yield an ounce of insight.

These situations are incredibly common in the business world. Unlike physicists, chemists, or biologists, business people are unable to run tightly controlled experiments. We can’t scour the earth for cool problems with clean statistical answers, like academics can. The problems come to us, and they’re usually messy and fraught with biases.

While the statistical discipline has very good tools for identifying relationships within a data set, it’s blind to the quality of the data itself. Here are three common problems that can render a statistically significant conclusion dead wrong:

  • Self-Selection: Survey or interview respondents often respond because they have good things to say about the subject. Likewise, firms prefer to disclose financial information when it’s favorable or improving. This type of problem paints a rosier picture, on average, than is really the case.
  • Missing Data: More information is almost always available on larger companies and more recent events. When there’s missing information, do you come up with a proxy, estimate the missing fields, or exclude those entries altogether? Any solution you choose will introduce a layer of noise into the equation.
  • Tenuous Proxies: Say we’re trying to measure the impact of marketing spend on customer acquisition for a particular industry. Since neither variable is public, we’ll have to use proxies and assumptions to estimate them. It’s important to remember that the ultimate conclusion isn’t measuring the actual variables, but their proxies: the strength of the conclusion relies heavily on how close the two are. Since we don’t have the actual variables, this can be very difficult to measure.

The result is a difficult balancing act: you have to tolerate moderate levels of bias within the data to reach a conclusion, but as the problems pile up, the validity of that conclusion declines no matter how statistically significant it may appear.

Still, an analyst is being paid to solve a problem, and to some, an inconclusive result is the worst kind of failure. It can be tempting to present a conclusion as a home run despite obvious flaws in the methodology. The best analysts properly communicate their level of confidence in the results, and know when to admit that the data at hand is inconclusive, even if it isn’t what their stakeholders were hoping to hear. Doing otherwise can be extremely destructive.

So do I love numbers? I’d say our relationship is rocky. I certainly respect them as a powerful tool to understand the world around me, but am realistic about their limitations. Statistical conclusions are only as good as the data that goes into them, and there isn’t good data for every problem.