Big Data doesn’t care
I’m halfway through my TechEd experience and I wanted to share a few thoughts about how it’s going so far.
The guest keynote speaker was Nate Silver. He is renowned for correctly predicting the 2012 Presidential Election results for all 50 States. He spoke about practical analysis of Big Data. We often come to an analysis from a pre-defined position. We “know” what the data is going to show us. And the results are generally biased towards our beliefs. And when the actual outcomes don’t match the predictions from our analysis; we blame the data.
In reality, our analysis techniques are flawed by our assumptions. As a technical consultant I’ve learned over the years to “listen” to what the computer is trying to tell me. I’ve done many performance analysis engagements. Invariably, the customer will tell me what they think the performance problem is and they just need me to confirm. But, in many cases, once I’ve compiled and analyzed the statistics, the performance issue(s) is something entirely different. Sometimes I’m met with skepticism at my initial findings. But once I explain the reasoning and logic behind it; determined by the data, not the other way around, they become accepting of the findings.
To put it another way; the data doesn’t care. It doesn’t care about the results of our analytical processes. It doesn’t care what we predict will be the outcome based on our understanding. It just keeps being generated. It’s up to us to learn how to listen to what it’s trying to tell us. It was related during the Q&A session following Nate’s speech that we are measuring data in zetabytes. To give context, stacking dollar bills from the Earth to Pluto 18,000 times approximates a zetabyte. That’s really Big Data! And it doesn’t care what we think, it just keeps growing.
Nate encouraged all of us to become Data Scientists. We need to learn how to spot patterns and trends in the data that we can turn into actions. Within the volume of data, there will be false signals. With Big Data, there will be many false signals. We need the tools and the expertise to become more skilled at analysis as Big Data continues to grow.
One of the issues facing us today in analyzing Big Data is the speed at which is degrades. The older data becomes the less relevant it is to our results set and the more it becomes “noise”. We need to be able to analyze the data faster.
I’m looking forward to the sessions I’ve selected to see how to become a better Data Scientist. I’m also interested in hearing your thoughts.