[SAP HANA Academy] Live4 ERP Agility: SDI Hadoop Overview
In the next part of the SAP HANA Academy’s Live4 ERP Agile Solutions in SAP HANA Cloud Platform course, Tahir Hussain Babar (Bob) provides an overview on the source Hadoop system used in the course. The Hadoop system is where the EPA (Environmental Protection Agency) weather data has been loaded. It will be a source data set for a SAP Fiori application in the SAP HANA Cloud Platform. The data doesn’t need to be replicated in HCP and thus the data is accessed using virtual tables and remote sources created with SAP Smart Data Integration. Watch Bob’s video below.
(0:30 – 2:30) Background on Data in Hadoop
In the Live4 course a connection will be made from HCP to a Hadoop data lake that contains over 70 years of weather data from the EPA. There is no need to copy all of that weather data into HCP so a virtual table for that data lake will be created so we will only get the data that is needed.
Many options exist for data connectivity. One way demonstrated earlier in the course was to use HCI-DS is to copy the data to the remote system. Another method is to use SAP Smart Data Integration to replicate the data. With SDI, once the data is changed in the source system it will immediately be reflected in HCP. Yet another method is using virtual tables. With virtual tables no replication is necessary as you have a virtual view in your data source.
(2:30 – 7:00) Examination of the Databases in the Hadoop system
On a Windows machine Bob is running Hadoop Hortonworks data platform 1.2. If you don’t have your own Hadoop system please check out this series of four videos on the SAP HANA Academy on how to use, install and configure Hadoop Hortonworks.
This data set (download here on GitHub) contains weather data from the EPA. Everyday the EPA surveys over 300 weather stations and has recorded various measurements for over 70 years. Instead of storing all of that data on the hot storage system of SAP HANA we will store it on Hadoop as a cold storage system that we can then access.
Bob launches a command prompt window and connects to his HIVE 0.12.0 system and then executes hive from his bin directory. If you’re following along with the course please make sure to use HIVE 0.12.0 throughout. Bob has already loaded the EPA databases into his Hadoop system so by entering show databases; as a command he can view his three databases (default, epa and live2). Then entering show tables in epa; shows the five tables from the EPA Bob has exposed in this database (aqi_datalake, humidity_datalake, pressure_datalake, temp_datalake and wind_datalake).
Next Bob enters select * from epa.aqi_datalake; to see all of the 70 years worth of air quality index data stored in a table format that has been translated using the HIVE query language. Hive query language enables BI tools and SAP HANA to understand the format.
In the next video Bob will show you the user authorizations and agents you need in order to connect your Hadoop system to HCP using Smart Data Integration.
For further tutorial videos about the ERP Agility with HCP course please view this playlist.
SAP HANA Academy – Over 1,200 free tutorial videos on SAP HANA, Analytics and the SAP HANA Cloud Platform.