SAP TechEd – Big Data and the Real-Time Data Platform Including SAP HANA and Apache Hadoop Part 2
Figure 1: Source: SAP
Big data starts about 500 million records – not because you can’t store it – it is when you start to query it and face issues
With HANA you can do billions of records, TB’s of data
Hadoop comes into the picture when you have 100’s TB’s of data
At some point you know, you are not putting it in HANA
HANA is real-time, and event stream processor. You might turn to Hadoop when you have massive amounts of data to ingest. Each machine is parallelized.
HANA has variety of data and push to Hadoop. Hadoop gives you flexibility to handle all types of data including image processing.
Value is the “storage area” – data lake. HANA is for High value with low volumes of low data.
You can offload historically to Hadoop. Hadoop is not a database. It manages blocks of data.
Hadoop vs. NLS? On BW there is a Near-Line-Storage Sybase IQ option to unload data from HANA to guarantee data is there, consistent. Right you now cannot do NLS in Hadoop. Hadoop doesn’t have transactions.
Figure 2: Source: SAP
You can go from HANA out to other databases
Smart data access is the “glue”
You can create virtual tables in HANA that refer to tables in other databases
You don’t have to do syntax from other sources and you get richer semantics
You are pushing the processing down to the remote source
Smart data access will send data out to remote site
Automatic data translation is convenient as well.
Figure 3: Source: SAP
Smart data access is one way to connect the “worlds”.
On the left of Figure 3 is the consumption model, store and process, and ingest.
You can use the data in one of two ways – applications such as machine learning & predictive analytics (product recommendations). Analytics use cases include dashboards, explorations (Lumira) – these can use HANA or Hadoop.
You can go from BusinessObjects to Hadoop
On the bottom you have ESP, replication framework, information management, and Data Services can operate with Hadoop.
Figure 4: Source: SAP
Direct HANA – Hadoop via Smart Data Access you have virtual data access. Integration via ETL to move data but with TB’s of data you can move on a schedule but it is not interactive. Data Services give you PIG with scripting.
You can use BI against HIVE using multi-source universes as of BI4.1 for scheduled reports.
Question & Answer
Q: How do you deal with the fact you have different response charactistics with the 2 systems?
A: With SP7 there is the remote materialization capability to cache queries – you are trading time for space (remote caching)
Looking at improvements to make it into Hive faster
Q: Smart data access works against different sources?
A: Yes, Teradata, ASE, IQ, SQL Server
Q: What distribution is certified?
A: SAP resells Hortonworks and Intel distribution
Hive .9 or greater is supported, and Hadoop 1
Q: Smart data access connection is used?
Uses ODBC; BI uses JDBC