Big data is rapidly becoming easily in the reach of most business analysts—in fact, it is already beginning to shape our working lives. Our data management systems, augmented by faster processing techniques (such as in-memory computing) and distributed processing (such as Hadoop) is putting an ever-increasing pile of data at our disposal. But what is the busy knowledge-worker to make of it? Our brains aren’t necessarily getting bigger; in fact, we’re pretty much stuck with the same brains humans have had for 50,000 years or more. The morphology of the human tool for evaluating information has changed very little in sophistication or form since then. In Miller’s 1956 paper on cognitive capacity, he experimentally established that most of us are good at dealing with roughly seven plus or minus two items at a time.
To get around this limitation (or blessing) we often arrange data in ways to simplify things. We divide, combine and segment information into smaller groups to ease comparison. Simple techniques, such as average, maximum, and minimum, when looking at a collection of information like sales figures can go a long way towards telling us a lot without exceeding our capacity to make sense of the information before.
A Blessing and a Curse
Big data is becoming both an enhancement and a challenge to what we evaluate. Not simply because there is more data, but that it is coming in smaller pieces that must be arranged to make meaning from it. Consider the world of RFID tags. By themselves, the messages they produce are pretty much a serial number (of the tag itself) and sensor id which tells us that tag came close to it, perhaps with a timestamp. Without significantly more context, such as what the tags are attached to, and where the sensors are positioned, say in a busy warehouse, there stream of numbers is little more than noise. If we enrich this data, however, and then process it statistically, say by comparing the movement of a unique class of item, such as Allegra® and Claritin®, we can draw all sorts of interesting conclusions, for example that the pollen level might be high in the places to which these items are destined. As we start to correlate re-orders of such items, we can even begin to draw conclusions about the effectiveness (or at least the desirably) of each product. The application of statistics can reduce the complexity of a stream of RFID tags into something our minds can work with.
Thinking like a Scientist
This illustration can also serve to highlight the risks in the simplifying power of statistics—it can also be used to mislead, even when we are sole analyzer and consumer of the data. What if the demand was not driven by pollen? Was it cold season? Was there a coupon sale in a major retailer? Simply applying enrichment of data about the sensor stream is inadequate to truly diagnose the source of the demand. To better leverage big data, we need to add the skepticism of a scientist to our resultant summaries. Once we formulate a hypothesis (in this example, pollen) we need to devise additional methods to test our theory, such as examining weather and agricultural data, the timing of the year, or disease reports, before we reach a rash conclusion.
Big data will afford us great opportunities in using data, but it also presents risks. As more data is gathered, consolidated, and presented to us with less pre-analysis and curation, these risks will grow. It is best if we start familiarizing ourselves with the tools of experimental thinking and analysis to better serve our decisions.
Improve your data-driven decision making with next-generation data warehousing solutions from SAP.