Skip to Content

My son absolutely loves going out for Chinese food. He always looks forward to the times we go and knows exactly what he wants when he gets there. His favorite, which is no surprise for 7-year old, are the fortune cookies. We all take turns reading our fortunes to each other and they’re almost always uneventful. The last time we went things were different however. My fortune cookie was particularly intriguing. It read:

“The farther backward you can look, the farther forward you are likely to see”

Being a Solution Engineer, this got me thinking about the technology that I work with on a daily basis. Of note, it reminds me of how there is a clear need for our customers to retain as much meaningful information as possible.

The best illustration of this came during a presentation from one of our partners on Predictive Analytics, specifically as it relates to Big Data. The presenter was telling of how he worked very closely with Fannie Mae following the subprime mortgage crisis in 2008. He detailed how they took their data, worked closely with the agency’s analytics experts to understand their predictive models and summarized a post-mortem. The findings were sobering. What they found was that the agency could have predicted the mortgage crisis. Given the data and models that they were being used, they knew that something was going to happen but couldn’t identify an event with any precision. The “eureka moment” came when they brought data once considered to be “cold” back online. The same algorithms were applied against an additional 10 years of data and the conclusion was that the exact year and quarter the crisis was going to hit and they could have foretold this had they not aged data out of their systems.

From that day forward I never looked at the topic of data temperature the same. At the very least the concept of “cold data” forever left my vernacular. At the very best, the coolest you could get is “warm”. To that end I’ve begun referring to data as white hot (in-memory, frequently accessed), hot (integrated near-line storage, often accessed) and warm (Hadoop/other RDBMS, potentially accessed). As part of my new charter to change the way we look at data temperature, I have created my own Scoville scale for the different temperatures of data.  The Scoville scale is apropos since, by definition, there is no such thing as a “cold” pepper in the scale. Everything has some degree of heat, it’s just a matter of how much.

/wp-content/uploads/2015/05/scoville_714201.jpg

From a technical perspective, there are a number of solutions to support the Scoville scale of data temperature. For example, a modern business platform like SAP HANA brings the power of in-memory technology to deliver interactions in real-time, integrated near-line solutions that allow the storage of vast amounts of data and the connectors necessary to access infinite amounts of data stored elsewhere. Let’s take a simple architectural view of what’s described above with SAP HANA:


/wp-content/uploads/2015/05/blog_arch_714210.jpg

SAP HANA + SAP IQ

SAP HANA is a platform for modern business applications. The platform boasts many features to support the most demanding use cases. Its capabilities include: a powerful in-memory database, native data processing engines (calculation, predictive, unstructured text, geospatial, etc), connectivity to Hadoop and other relational databases and integration with the statistical computing language R. 

What makes this platform special is its tight integration with SAP IQ, a columnar database technology obtained in the Sybase acquisition, via dynamic tiering using extended tables. In effect, HANA manages these tables as if they are local to the platform when in reality the physical data is stored in IQ on commodity hardware. This allows petabytes of data to be stored in a cost effective manner while delivering business class performance users expect.

Taking this back to data temperature for a moment, this combination satisfies the top two temperatures on my Scoville scale; white hot (HANA) and red hot (IQ).

Hadoop + Smart Data Access

Lastly let’s clear out our coolest – warmest? – data tier of the three, warm data. The big player here is Hadoop. Hadoop provides the distributed processing of large amounts of data. Like IQ it also uses commodity hardware. Think of this as the catchall for potentially used data that can be called upon at a moment’s notice from HANA to provide more history on a report or deeper pool of reference data for an algorithm.

Business technology is the most interesting and exciting it has ever been. Organizations are using technology to not only transform their businesses but entire industries. The combination of Big Data, Predictive Analytics, Data Science and the ever-decreasing cost of storage media means that we need to re-visit our preconceived notions regarding data temperature. It is clear that the technology is available to ensure that data is never regarded as “cold” again. I hope everyone considers the Scoville scale to help change the way our customers look at their data. Who knows? You just might change their fortune!

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Geoff Trembley

    Maurice, You covered three of my favorite topics in one post – Chinese food, Hot Peppers, and Databases.  Great illustrations, and what a simple way to describe data tiering!

    (0) 
  2. Henry Cook

    Nice summary Maurice. A comment I’d make is that its not just about performance in itself. As you get up the hierarchy moving from ‘hot’ to ‘super-hot’ in the in-memory area what you’re often gaining is agility and productivity – the ability to work with instant response on production sized data, against a simple data model, and one that can be readily changed. Thus, apart from the run-time speed advantages, the pace at which a user or developer can work is greatly increased.

    (0) 
  3. Douglas Hoover

    So if I understand this correctly Hana and IQ/DT can support petabytes of information very efficiently and cost effectively so you don’t really need Hadoop…

    (0) 
    1. Tom Cenens

      Hi Douglas

      HANA + IQ/DT is used for a different use case compared to HANA + Hadoop so they can co-exist.

      HANA + IQ / DT is used for structured data

      HANA + Hadoop is used for unstructured data

      Kind regards

      Tom

      (0) 
  4. Roland Kramer

    Hi
    nice Methapher about the hot/warm/cold data.

    As SAP IQ and HANA can leverage the performance of SDA, the SAP-NLS Solution is a efficient way to optimize the performance and the TCO aspect of the database plattfoem.

    Even that the different options can co-exist, you have to decide which option is used for the SAP-NLS Solution. In the meantime Hadop can be used as well for NLS togehter with BWonH, but from the performance aspect SAP IQ is the much fater Solution.

    Have also a look to the Community Blog for SAP-NLS

    Best Regards Roland

    (0) 

Leave a Reply