Barrier 2 – Hot/ Warm/ Cold
In my prior blog, I wrote about the Four Barriers today that prevent customers from achieving strategic advantages for Big Data:
- Value of Big Data Use Cases
- Hot/Warm/Cold data aggregation
- Data Wrangling
- Widespread Predictive Analytics
I covered the first essential point, how to determine value for Big Data Use Cases in my prior blog. This entry is about Hot/ Warm/ Cold data aggregation. First, why would you want Hot/Warm/Cold Aggregation strategy?
More data leads to better business decisions. Anand Rajaram teaches a class in Data Mining at Stanford University. In his blog, Anand describes two teams that tacked the Netflix Challenge of increasing the predictive power by 10%. “Team A came up with a very sophisticated algorithm using the Netflix data. Team B used a very simple algorithm, but they added in additional data beyond the Netflix set: information about movie genres from the Internet Movie Database (IMDB). Guess which team did better? Team B got much better results, close to the best results on the Netflix leaderboard!! I’m really happy for them, and they’re going to tune their algorithm and take a crack at the grand prize. But the bigger point is, adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set. I’m often surprised that many people in the business, and even in academia, don’t realize this.” Anand Rajaram, “More Data Usually Beats Better Algorithms,” Datawocky (blog), http:// anand.typepad.com/ datawocky/ 2008/ 03/ more-data-usual.html.
Companies can enrich their data sets dramatically by enabling their business users to ask extended questions. Marketers wondering about the effectiveness of CPG brand product launches need easy access to detailed ERP and SCM data. A Vice President of Customer Support tackling attrition need to see detailed CRM data, as well as Internet sentiment analysis. Some of the data required is ‘Big Data,’ unstructured internal data, or data outside the four walls of your company. Some of that data might be more traditional ‘Small Data,’ already in your transactional systems. They key things is effectively aggregating more data.
Hot/ Warm/ Cold Aggregation lets you align the value of data with cost of aggregation. For Big Data aggregation, Hot is an optimized in-memory solution (HANA,) Warm is an optimized disk solution,(such as Datwarehouse, SAP IQ, NoSQL, or HADOOP plus caching,) and Cold is a distributed file-system solution (HADOOP.)
Federation lets you find and analyze your Hot/ Warm/ Cold Aggregation.
A layer over your Hot/ Warm/ Cold data aggregation solutions tracks where your data resides. This layer can query across the separate data stores. More about the technology in a moment, but first, let’s discuss the business value. Hot/ Warm/ Cold aggregation enables you to align the value of the business data with the appropriate investment in data aggregation. In-memory data aggregation layers, like SAP HANA, are extremely valuable, but pricey. HADOOP offers unmatched low cost. Hot/ Warm/ Cold data aggregation belongs on every company’s Roadmap since it aligns information value with aggregation cost.
SAP Smart Data Access is a great new solution. I recently saw Snehanshu Shah, Global VP SAP HANA present on this topic. Smart Data Access gives a business user access to data, regardless if it’s located in HANA, SAP IQ, Oracle DB, Teradata, HADOOP, etc. It does this by enabling Data Federation, the ability to query the data without first replicating it into HANA. But Smart Data Access does more than just Federation. It optimizes the query, it will take your SQL statement, parse it, and optimize that query for the particular data store. More valuably, it can determine if the performance would be higher executing that query in say Teradata and just getting the results, or getting an in-memory copy of the dataset in HANA. It’s an exciting new solution. See more at http://help.sap.com/saphelp_nw74/helpdata/en/bb/df686dc0d94f779b5c11d06753da95/content.htm?frameset=/en/a0/efe8240b754ee4ac1acc1ff57fa87c/frameset.htm
EMC offers a complete solution for Hot/ Warm/ Cold aggregation. Michael La Fouci with VCE is leading the Project Jupiter charge. https://twitter.com/search?q=%23DataTemperatures&src=hash
Use Cases for Hot/ Warm/ Cold.
Use Cases for Hot/ Warm/ Cold aggregation exist in every industry. In Manufacturing, Hot data would be ERP with revenue, sales, inventory levels, and CRM with customers, pipeline and booking data. No ERP architect will allow you to track detailed, parametric manufacturing and quality control systems in ERP. Detailed manufacturing and quality data can easily be 10 – 50x the data size of ERP and CRM combined. Software solutions today include Manufacturing Execution Systems (MES,) Historians, SCADA control systems, etc. They tend to be highly fractured and distributed. Imagine if you could pull this detailed Manufacturing and Quality data into Cold HADOOP storage. Now, imagine any business user could run a BI, Data Visualization, or Predictive Analytics query or program over this Hot/Cold storage. The Hot storage should be in-memory, to overcome the well-known issues of having IT need to create Star Schemas in disk-based Data warehouses.
Hot/ Warm/ Cold data aggregation for Predictive Analytics allows you to get more dependent and independent variables in your equations. Predictive Analytics is slowed down by the amount of time it takes to wrangle data into the proper format, then by the limited number of datasets available. A Hot/ Warm/ Cold storage allows you to present key datasets your Data Architects and Power users need to run these amazing fact-based programs. Adding additional data to the Netflix challenge is cheating, but adding additional data for your company is certainly not.
Overcoming Four Barriers
So, we’re overcoming the Four Barriers. First we’ve identified and prioritized high value Big Data Use Cases that are aligned with our business strategy. Now, we’ve added Hot/ Warm/ Cold data aggregation to our Roadmap. Next we’ll look at Data Wrangling – the art of getting data into a Big Data platform that builds upon well-established ETL. After that, we’ll discuss how to get Predictive Analytics power in the hand of more than 1% of your business users. Love to hear your comments and feedback.
VP Customer Innovation