This was an ASUG webcast from last month and I finally got around to watching it, to obtain a better understanding of Hadoop and Big Data. SAP provided this webcast.
If you have SAP solutions, what does it mean to use Hadoop? That was the topic of this webcast. This webcast covered the CIO Guide on Big Data &#8220;How to Use Hadoop … | SAP HANA
Figure 1: Source: SAP
The SAP speaker reviewed the Gartner definition of big data
High volume means hundreds of TB or petabytes
High velocity is where the data arrives rapidly
Variety includes SAP system, social media, and other types
You want it to give you better insight
Differences between HANA and Hadoop
Figure 2: Source: SAP
Hadoop can run over several servers
It is open source, which is lower cost.
Hadoop is designed to run on commodity servers – you don’t need a server with higher reliability
Figure 3: Source: SAP
It uses a Map Reduce Programming model, which the speaker said is simple to use.
The first phase is to select phase, and the second phase combines the results
This allows it to scale in volume
Hadoop is slower than a conventional relational database and even more slower that HANA
Figure 4: Source: SAP
On the bottom of Figure 4 is the Data Storage, Hadoop Distributed File System, which can store any type of data that you can think of and any volume – 500 TB or more.
On top of that is the computation engine to process the data
It works opposite of relational database where you define it, clean, correct, then load to relational database – it takes time
In Hadoop you take the raw data and load it to Hadoop and then use it.
Figure 5: Source: SAP
Figure 5 shows the Hadoop ecosystem
It shows two computation engines. Hive is not a full SQL.
HBase can be used to access piece of data with a key to retrieve
Next the speaker covered HANA which has been covered here on SCN before.
Figure 6: Source: SAP
Figure 6 compares the three ways of looking at the data.
Relational database is good for solving problems but if you want OLTP and OLAP in real time, then SAP says HANA is good, as long as you don’t have too much data.
Hadoop can handle any type of data but what you can’t do OLTP with Hadoop. It can’t be a substitute for relational database. It can handle large volumes at low cost.
Businesses will need all three, says SAP. It is not a question of “HANA or Hadoop” it is HANA and Hadoop.
My next blog, part 2 of this ASUG webcast, will cover HANA and Hadoop, Key Scenarios.