Every enterprise accumulates dark data. Companies don’t try to hoard this unanalyzed information, it just happens because it’s created almost everywhere.
Servers in data centers generate an enormous trove of largely untapped log file data. Manufacturers’ shop floor control systems and robots produce dark data as well as widgets. Little of the data from a retailer’s point of sale system gets mined. Information from diagnostic equipment in intensive care units is generally ignored. The list goes on.
Pick your market sector and the systems they depend on, and you’ll uncover dark data. Organizations simply generate far more data than they can currently exploit.
Finding insight in these mostly ignored data sources is important. According to a report in Forbes, “Organizations that treat idle information, or so-called “dark data”, as anything less than having potential economic benefit will find themselves at increased competitive disadvantage.” And in areas like science, mining dark data should provide major breakthroughs to benefit us all. As Wired magazine put it: “Freeing up dark data could represent one of the biggest boons to research in decades, fueling advances in genetics, neuroscience, and biotech.”
That said, Gartner analyst Andrew White challenges proponents of the value of those untapped massive data repositories inside most enterprises. He writes, “But there is a flaw in this hyped argument about dark data. Unless you, the business user, have an idea of what you want to ask of this dark data, there is no point worrying about it.”
That’s a valid point of view. But it applies to all data. Unless someone has a query that is relevant to a given data set—whether it’s dark data or a basic company balance sheet—there is no point in worrying about it.
However, business users now have the analytics tools capable of ingesting and mining vast amounts of dark data quickly. For example, a manufacturer can connect temporal data from shop floor control systems to product return patterns from customers to determine if there is a time-of-day problem along the assembly line.
I agree with White that business users must have questions for dark data before it becomes useful. But instead of telling them to not worry if they don’t have questions, I’d be concerned about analysts who were not able to ask the right questions. That’s far more worrying.