Seamless Big Data tiering with HANA, Hadoop and Vora…with a little help from DLM – Part 1
The SAP HANA Data Warehousing Foundation (DWF) option is a series of packaged tools for large-scale SAP HANA installations, which support data management and distribution within a SAP HANA landscape. With SAP HANA Data Warehousing Foundation you can achieve smart data distribution across complex landscapes, optimize the memory footprint of data in SAP HANA and streamline administration and development, thereby reducing TCO and supporting SAP HANA administrators and data warehouse designers.
In this blog I will focus on Data Lifecycle Manager (DLM). Please note there are other tools as part of the DWF option that we will not look at during this blog.
SAP HANA is the core platform in any SAP Big Data target architecture. One of the challenges you will face with any Big Data Architecture is managing tiered storage. You need to make intelligent decisions on how to optimize and balance price and performance.
For the most part in the past you had to manually manage and develop your tiered data processes including: data relocation, monitoring, logging, scheduling, testing and harmonizing tiers. I will show how DLM addresses these challenges. This is not a training session. The purpose is to give you a high-level understanding of the benefits of using DLM as part of your Big Data tiering/archiving strategy. You should get a better understanding on how this can reduce cost and complexity while simplifying administration and development.
Below is a high level view on the DLM process:
Before we get started, let’s have a look at the table/data that we will be creating a data relocation profile for. The table contains data that is the result of a computation over the Call Detail Records (CDRs) generated by the Telecom Italia cellular network over the city of Milano. CDRs log the user activity for billing purposes and network management. Below is simply the record count by country.
1 – Once you have DLM installed and configured you can access the xs application via
http(s)://<HOST>:<HANA XS PORT/sap/hdm/dlm/index.html
Here is a screen shot of the home page. It’s a great place to see a quick snapshot of the last data relocation runs and node utilization forecast.
2 – The first thing to take care of will be to add a storage destination. Currently DLM supports 4 different storage destinations:
- SAP IQ over SDA
- HANA Dynamic Tiering Local
- Spark SQL (DESTINATION)
- Deletion Bin
Each of the different storage destinations has prerequisites. To use HANA Dynamic tiering you need to have configured extended storage for HANA. For Spark SQL you need the Spark Controller and Hadoop configured.
3 – Once you have your storage destination configured and activated you will be able to create Lifecycle Profiles. The below profile is based on a source HANA table. The destination will be spark. All data is in the hot store(2459324).
4 – Once you have your profile created and source/targets identified you need to create a rule that DLM will use to tier the data. As seen in the below screen shot I have defined the rule to move any record with COUNTRY_CODE = ‘40’ to cold storage. The editor in real-time gives you information on the number of rows effected by the newly created rule. In a production environment your rule would most likely be dynamic based on the date.
5 – Once your rule has been defined and saved DLM gives you the ability to “Simulate” the data relocation.
6 – Below you can see the simulation shows 100,740 records will move from Hot to Cold.
7 – If you are confident in the data relocation simulation you can move to scheduling the process. For this blog I’m going to trigger the process manually.
8 – The logs section provides an area to monitor the jobs progress in real-time as well as look at previously executed jobs and the associated steps. Below you can see the detailed steps and associated status for the CDR job we manually executed.
9 – Now that the data has been relocated to cold storage we can take advantage of some generated objects that DLM has created to optimize access between the hot and cold data. Below you can see 2 new objects created by DLM.
1 – Virtual table
2 – Union view that combines both hot and cold data.
10 – Selecting from the Virtual table should only show the data that has been relocated to Hadoop.
11 – Selecting from the DLM generated view returns us the data located in both HANA and Hadoop.
Conclusion: As you can see managing your data across tiers becomes seamless and integrated with DLM. You can take advantage of the DLM cockpit to create, mange and monitor all your data relocation jobs. The generated objects should help your applications and analytics take advantage of all your enterprise data regardless of its temperature/age.
Next Steps: In part 2 of this blog series we will take a deeper look at the Data that was relocated to Hadoop and how we interact with it using VORA.
Thanks Rob for the details with Screenshots. This is useful!
Nice Blog 🙂 - You also might wanna check the SAP HANA Academy DWF YouTube Channel for further insides (see link above) on DLM and DDO
Great article. Have a question, do you know where are all the cold storage tables stored? How can we access them from HADOOP?
Thanks Rob. Great Introduction to DLM and how it integrates with the four datasources
Great blog and very informative too. Understood that the movement of data from Hot to Cold is possible via DLM/Spark. However, what is the approach we should follow to move data from Warm-to-Cold ?