One of my few pet peeves in life is when I hear someone say “the numbers don’t lie!” While I understand and appreciate the use of fact-based information to support an argument, its application has become overly generalized.
People often use numbers as a crutch to support weak arguments, presuming that any stat is a good one, capable of automatically validating their position. The truth is that numbers can and do lie to us every day. This is especially important to keep in mind as the hype around Big Data and Analytics reaches a fever pitch.
As a reminder to all of us who use data in work and life to make decisions, I’ve put together some examples of how numbers can often lie or mislead. Feel free to add more that I’ve missed!
1. Small sample size
- Description: These conclusions based on a small number of data points, yet portrayed as an accurate reflection of the truth. When seeing any data, this is the first question I will always ask.
- Example from daily life: Baseball statistics. Volumes have been written about the use and misuse of baseball stats, but one of the most common mistakes is to judge a player based on a few weeks or months of performance. In reality, even the worst baseball players can look like All-Stars for short periods of time. It takes multiple years of data to validate the true talent level of a player. To illustrate, here are some recent players who have had great 3-4 week stretches but are no longer in the Major Leagues:
|Player||Recent Award||Current Playing Status|
|Dee Gordon||MLB Rookie of the Month – Sept 2011||AAA – Minor Leagues|
|Jemile Weeks||MLB Rookie of the Month – June 2011||AAA – Minor Leagues|
|Jair Jurggens||MLB Pitcher of the Month – May 2011||AAA – Minor Leagues|
|Bryan LaHair||National League All-Star – 2012||Playing in Japan|
2. Big meaningless numbers
- Description: These large numbers are meant to imply a significant trend, but do not provide any context. Therefore its meaning is of limited or no use.
- Example from daily life: Social media stats. Saying that you have a lot of Twitter followers or Facebook fans doesn’t really mean anything, yet they are often used as a proxy of someone’s level of “influence.” There are easy ways to get a ton of followers in social media. What matters is whether those people actually care about what you’re saying, if you’re engaging them, and if it results in a real business benefit. I cringe when I see big social media “ego metrics” now.
3. Correlation, not causation
- Description: Such figures state that Variable A causes Variable B, when in fact they are merely correlated.
- Example from daily life: Taken from SAP CMO Jonathan Becher’s recent blog on this topic: When male college students wake up with a headache, a large percentage of the time they are still wearing their shoes. Does sleeping with your shoes on really cause headaches? Of course not, they are only correlated. You could play this game all day long.
4. Selection bias
- Description: These numbers imply that data came from a random sample when it actually came from a (systematic) non-random sample.
- Example from daily life: Online voting polls. These are easy to discredit because by definition, all participants have access to the Internet, which automatically distorts the sample. Furthermore, the results will skew towards the readership profile of the host Web site. This is not a big deal for trivial topics like sports or entertainment, but political views extrapolated from online results can lead to truly misinformed decisions.
5. Visual trickery
- Description: Some graphics deceive or mislead based on how the information is presented.
- Example from daily life: Changing the Y-axis of a graph to magnify the difference in data points (see example below). You see visual trickery all the time on cable news channels. Keep an eye on how graphs are manipulated the next time you watch a news show.
6. Arbitrary cutoffs
- Description: This is another form of selection bias. Setting arbitrary start-and-end points that impact the meaning of data.
- Example from daily life: Any “Top 10” list. Why is it 10 and not 11? Why does this blog have 6 bullet points instead of 10? Again, it’s not a big deal for trivial topics, but if it’s a list of Top Hospitals or Colleges, some people will make significant decisions based on that information. In addition to lists, any data that is time-bound could have arbitrary cutoff dates, so we should always keep that in mind.
So that’s my list. What am I missing? Do you have other examples of “numbers that lie”?