Skip to Content

This was an ASUG webcast from last month and I finally got around to watching it, to obtain a better understanding of Hadoop and Big Data.  SAP provided this webcast.

If you have SAP solutions, what does it mean to use Hadoop?  That was the topic of this webcast.  This webcast covered the CIO Guide on Big Data “How to Use Hadoop … | SAP HANA

/wp-content/uploads/2013/12/1fig_352336.png

Figure 1: Source: SAP

The SAP speaker reviewed the Gartner definition of big data

High volume means hundreds of TB or petabytes

High velocity is where the data arrives rapidly

Variety includes SAP system, social media, and other types

You want it to give you better insight

Differences between HANA and Hadoop

/wp-content/uploads/2013/12/2fig_352343.png

Figure 2: Source: SAP

Hadoop can run over several servers

It is open source, which is lower cost.

Hadoop is designed to run on commodity servers –  you don’t need a server with higher reliability

/wp-content/uploads/2013/12/3fig_352344.png

Figure 3: Source: SAP

It uses a Map Reduce Programming model, which the speaker said is simple to use.

The first phase is to select phase, and the second phase combines the results

This allows it to scale in volume

Hadoop is slower than a conventional relational database and even more slower that HANA

/wp-content/uploads/2013/12/4fig_352345.png

Figure 4: Source: SAP

On the bottom of Figure 4 is the Data Storage, Hadoop Distributed File System, which can store any type of data that you can think of and any volume – 500 TB or more.

On top of that is the computation engine to process the data

It works opposite of relational database where you define it, clean, correct, then load to relational database – it takes time

In Hadoop you take the raw data and load it to Hadoop and then use it.

/wp-content/uploads/2013/12/5fi_352346.png

Figure 5: Source: SAP

Figure 5 shows the Hadoop ecosystem

It shows two computation engines.  Hive is not a full SQL.

HBase can be used to access piece of data with a key to retrieve

Next the speaker covered HANA which has been covered here on SCN before.

/wp-content/uploads/2013/12/6fig_352347.png

Figure 6: Source: SAP

Figure 6 compares the three ways of looking at the data.

Relational database is good for solving problems but if you want OLTP and OLAP in real time, then SAP says HANA is good, as long as you don’t have too much data.

Hadoop can handle any type of data but what you can’t do OLTP with Hadoop.  It can’t be a substitute for relational database.  It can handle large volumes at low cost.

Businesses will need all three, says SAP.  It is not a question of “HANA or Hadoop”  it is HANA and Hadoop.

My next blog, part 2 of this ASUG webcast, will cover HANA and Hadoop, Key Scenarios.

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Rv Balimi

    Hi Tammy,

    This posting is very informative, however I can see most of the links are not working any more.

    Can you advice where can I get latest,

    CIO Guide : How to Use Hadoop with Your SAP® Software Landscape

    Thanks in advance.

    regards,

    Rv

    (0) 
        1. Tammy Powlas Post author

          Hi Rv  – the webcast was in 2013

          If you are looking for an updated guide I recommend asking it as a question in one of the HANA forums

          Good luck

          Tammy

          (0) 

Leave a Reply