Skip to Content
Author's profile photo Tammy Powlas

SAP TechEd – Big Data and the Real-Time Data Platform Including SAP HANA and Apache Hadoop Part 2

Part 1 is here Share the Knowledge – Big Data and the Real-Time Data Platform Including SAP HANA and Apache Hadoop

Continuing on with HANA and Hadoop in this SAP TechEd recording:


Figure 1: Source: SAP

Big data starts about 500 million records – not because you can’t store it – it is when you start to query it and face issues

With HANA you can do billions of records, TB’s of data

Hadoop comes into the picture when you have 100’s TB’s of data

At some point you know, you are not putting it in HANA

HANA is real-time, and event stream processor.  You might turn to Hadoop when you have massive amounts of data to ingest.  Each machine is parallelized.

HANA has variety of data and push to Hadoop.  Hadoop gives you flexibility to handle all types of data including image processing.

Value is the “storage area” – data lake.  HANA is for High value with low volumes of low data.

You can offload historically to Hadoop.  Hadoop is not a database.  It manages blocks of data.

Hadoop vs. NLS?  On BW there is a Near-Line-Storage Sybase IQ option to unload data from HANA to guarantee data is there, consistent.  Right you now cannot do NLS in Hadoop.  Hadoop doesn’t have transactions.


Figure 2: Source: SAP

You can go from HANA out to other databases

Smart data access is the “glue”

You can create virtual tables in HANA that refer to tables in other databases

You don’t have to do syntax from other sources and you get richer semantics

You are pushing the processing down to the remote source

Smart data access will send data out to remote site

Automatic data translation is convenient as well.


Figure 3: Source: SAP

Smart data access is one way to connect the “worlds”.

On the left of Figure 3 is the consumption model, store and process, and ingest.

You can use the data in one of two ways – applications such as machine learning & predictive analytics (product recommendations).  Analytics use cases include dashboards, explorations (Lumira) – these can use HANA or Hadoop.

You can go from BusinessObjects to Hadoop

On the bottom you have ESP, replication framework, information management, and Data Services can operate with Hadoop.


Figure 4: Source: SAP

Direct HANA – Hadoop via Smart Data Access you have virtual data access.  Integration via ETL to move data but with TB’s of data you can move on a schedule but it is not interactive.  Data Services give you PIG with scripting.

You can use BI against HIVE using multi-source universes as of BI4.1 for scheduled reports.

Question & Answer

Q: How do you deal with the fact you have different response charactistics with the 2 systems?

A: With SP7 there is the remote materialization capability to cache queries – you are trading time for space (remote caching)

Looking at improvements to make it into Hive faster

Q: Smart data access works against different sources?

A: Yes, Teradata, ASE, IQ, SQL Server

Q: What distribution is certified?

A: SAP resells Hortonworks and Intel distribution

Hive .9 or greater is supported, and Hadoop 1

Q: Smart data access connection is used?

Uses ODBC; BI uses JDBC

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Kamaljit Vilkhoo
      Kamaljit Vilkhoo

      Hi Tammy

      For a customer who has HANA EE and BW on HANA on the same database. Should he go for HADOOP or NLS as data repository for matured data ?

      Kind Regards

      Kamaljit Vilkhoo

      Author's profile photo Tammy Powlas
      Tammy Powlas
      Blog Post Author

      Hi - that is really a question that I am not qualified to answer.  Please post this as a discussion question in this space: SAP BW Powered by SAP HANA

      Author's profile photo Francis Yesudas
      Francis Yesudas

      SAP has new feature with HANA called the Federated Enterprise Data Warehouse. Customers can invest into a low budget HADOOP system (or cloud based for low cost). Hadoop relies on cheap common hardware rather than expensive HANA dedicated hardware which is very expensive to expand as your data grows in few years time. Hadoop is good at massive parallel job processing and can give performance results similar to HANA. You can connect SAP data model with HAddop data model with a new concept called Open View ODS.