Did you ever wonder how LinkedIn generates that list of “people you may know?”
And how about when Amazon recommends “other books and music you might like?”
The recommendations after my last purchase were so on target that I actually took out my credit card!
Is it magic? No. It’s data science.
Data science looks at where data comes from and how to model it to test a hypothesis or reveal hidden insights.
The real power of data science is in finding interrelationships and patterns. Pattern recognition lets you make accurate predictions about consumer shopping behavior — and even the next big earthquake.
And the “magic” of data science is delivered in part by a technology appearing everywhere – Hadoop.
Hadoop is open source software that can divide the processing of huge data volumes across many inexpensive servers. It was first developed to help Internet-based companies manage enormous data sets more affordably.
Today Hadoop provides a flexible lower cost way for everyone to capture and store all kinds of data. Documents, images, video, email, tweets, blogs, Facebook posts, purchase transactions – all are available to collectively analyze for hidden information.
While companies use only 10-15% of the data they gather, they store it all anyway. Hadoop uses all of the data. By tracking every online behavior and transaction across very large groups, businesses can see where consumers are headed. And researchers can chart next steps to a breakthrough.
In talking with Hadoop provider Cloudera, COO Kirk Dunn explained how Hadoop can reverse our approach to data. Rather than search data to answer preconceived questions, we can interrogate data for the right questions to ask. That eliminates bias in our inquiry and cuts right to the hidden information.
Since today’s data sets are so large, their patterns can trumpet what consumers really think, without being asked. Patterns can also reveal the deep inner workings of complex systems, like the Genome or the universe.
As Kirk Dunn put it, “The Internet created a way to get everybody connected, while Hadoop creates a forum to analyze what everybody says.”
But looking for those patterns in large data volumes is like finding a needle in a haystack.
So Cloudera has combined Hadoop with other tools and professional services to help enterprises take full advantage of their data. Cloudera strengthens the open source inventiveness of Hadoop with end-to-end management for usability and deployment.
Cloudera’s Vice President of Technology Solutions Omer Trajman highlighted some fascinating examples of how big data solves real challenges in science and business.
For example, how do farmers deal with the risk of bad weather spoiling their crops? The Farmer’s Almanac first offered weather predictions in 1818. The Climate Corporation has introduced an entirely new business around predicting the weather!
They offer insurance policies to farmers for how weather could damage crop yield. They calculate risk from a detailed predictive model built from trillions of datapoints, like weather measurements, soil observations, location and more. Daily weather updates and real-time pricing let them tailor an individual policy for each farmer. All computation is done in the cloud on thousands of servers.
By combining data science, climatology and agronomics, farmers can better manage our food supply by reducing the risk of crop loss. As in my previous blog, here’s another example of how cloud and big data can help feed the world.
In financial services, online investment firms use Hadoop to tailor advice based on client preferences. This is how firms regain the personal touch otherwise lost in the virtual world.
Hadoop even helps CERN “hear the whispers of the universe” buried within tens of petabytes generated by the Large Hadron Collider.
For example, Explorys Medical built a cloud analytics platform to pinpoint venues offering best care at lowest cost. Now healthcare providers can see exactly how to improve their offerings to drive better health outcomes.
While harnessing big data used to be too expensive or complex for many in business or science, Hadoop is helping make it today’s greatest asset for turning a montage of predictive whispers into a powerful ROAR.