HANA Data Warehousing Foundation (DWF)
HANA Data-warehousing Foundation provides data management tools to manage data better in a Scale out HANA Landscape and complements the data-warehousing products like SAP BW and Native HANA DWH (also Hadoop and HANA Vora scenarios).
Major use cases could be:
- Data Warehousing – SAP BW HANA and Native HANA scenarios, including mixed scenarios. These in a multi-tenant / Scale out system over time of multiple iterations of write into various tables and read into various applications seems to become bulky and slow, owing to the volume of data and physical storage of data across various slave nodes. Conventional redistribution tool provided in HANA studio ha very limited capabilities.
- Multi temperature Data Management – HANA has concept of multi-temperature data storage for better data footprint management over expensive memory and pushing down various data chunks to Disk only in case not being used frequently or predicted to not being used frequently. Extended tables are one nice concept provided SP09 onward, providing better data management by permanently keeping data on disc and utilizing it as if it is available in memory. Special algorithms within HANA used to read data. But movement of data towards cold or if required back to hot is not very easily done. DWF provides tools to do this even bidirectionally using simple rules very easily.
- Multiple Applications on Multi-tenant / Scale out HANA – As the complexity of HANA installation is enhanced by putting multiple applications on it, the complexity of accessing data in most optimized way is also increased. DDO comes for a rescue here.
So DWF tools help the large scale out HANA environments using advanced UI based tools to overcome some challenges in areas of data ageing strategy and distribution of data in more logical/practical way to enhance performance. In nut shell a better managed data distribution and ageing strategy enhances overall performance and brings in better TCO for existing installations. This re distribution of data could be considered as De-fragmentation option available in Microsoft Based OS for better HDD utilization. It is one step further than this by understanding which tables should be grouped on single slave node for better readability with applications/reports.
** Data Warehouse Monitor / Data Warehouse Scheduler are planned tools
The DWF tools (including DDO and DLM) are provided as individual delivery units for import and installation to HANA. They are as follows:
- DWF Core – HCO_HDM
- Data Distribution Optimizer – HCO_HDM_DDO
- Data Lifecycle Manager – HCO_HDM_DLM
- DWF Documentation – HDC_HDM
Note: Import/Install should always start from HCO_HDM
Currently available version up to writing of this Blog is DWF 1.0 SPS05
Following is compatibility of HANA DWF (HDWF) with HANA versions
SAP DWF – Data Distribution Optimizer (DWF-DDO)
SAP DWF-DDO is also one of the HANA XS Based tool. This is used for reorganize data over multiple HANA memory nodes. It does a redistribution of data together with functionalities to plan, adjust and analyzing HANA landscape. This supports the HANA admin to manage these activities more efficiently using UI based interfaces and which could be run easily on any browser.
The use cases for DDO are:
- Comprehensive monitoring and logging of multi host systems
- Managing Main Memory allocation in scale out systems – The main memory in scale out systems are supported by multiple slave nodes as data storage areas. There is need of partitioning tables (DDO does not suggests partitioning criteria on it’s own). There are at times need for secondary partitioning as well. On top, based on usage, it should be also be aligned that same group of tables should stay as much possible on same slave node for better optimizing SQL performance. All this could be achieved with DDO. This balances the table distribution among nodes including considerations of constraints like server capacity, roles and table relations as defined.
- Multi Application Data Distribution – With further scaling out and even with existing scaled out system with multiple applications on same multi-tenant systems, distribution of tables is more complex and DDO does this using it’s algorithm to support best optimization.
On top of this, as DDO continuously monitors also, so these distributions and management of tables/partition is ongoing and not one time activity. Based on monitor logs, if required these could be optimized again. So it is an organic activity considering system landscapes are organic too.
HANA system hold details of partitioned tables and group information under SYS schema within TABLE_PLACEMENT table. DDO uses this table but copies it into it’s own table for the purpose called as DDO_PLACEMENT_TABLE. This helps it to re distribute and align new placement positions to be recorded.
Normally, tables/partitions are stored on individual nodes. A shared nothing architecture is followed and every node is allowed to operate on its own stored data. It works only on this and in case request comes from other node it is sent to requesting node or either way – send data operation to requested node, whichever is optimized for performance – based on how much data is stored and to be utilized on either of nodes in example.
But considering above examples, it will be optimized to run SQL if locally one node is storing all the data which might need to work together. The extra decision and data/operation transfer activities are in a way over head.
DDO help in overcoming all this by cutting these extra actions. This is done by doing redistribution and co placement of tables/partitions based on how they are used via SQL. This is called as Join Path Analysis. System Landscape details visually with % load on every node, and up to finest possible level (table/partition) is also available as monitor. Complex in built DDO Algorithms, called as DDO configuration are available for this.
Image – Illustration of Join Path Analysis
The redistribution plan is generated to achieve best optimization, and that could be even compared in terms of as is and to be state performance simulation – to see what will be achieved will be as expected as well.
DDO can even change partitions based on analysis and redistribution execution plan. The un-partitioned tables are not partitioned, but only moved if required to a new slave host.
This plan could be exported and imported between systems. Also from Stack 05, it is possible to run DDO from a remote system. Also for extended storage table/partitions DDO marks one slave for delicately using it.
Home screen for DDO:
SAP DWF – Data Lifecycle Manager (DWF-DML)
Data temperature management is used to reduce data foot print in HANA in memory storage. Concept of Extended table followed later to provide one more way of keeping data always on Disc for extended table/partitions and read/work on it as if present in memory. Both these aspects could be managed with simple UI based and intuitive XS DWF-DLM tool provided by SAP.
- Movement of Hot to Cold – Movement of data from Hot storage to cold can be done manually or based on scheduling options. This simple UI (runs on any browser) based capability could be used to generate simple rules to do so and could be edited as well. Once scheduled or run, it automatically does the movement
- Movement of Cold to Hot – In ever changing needs, one could easily do a cold to hot storage movement using this tool
- Bi Directional move – Bidirectional configuration is also possible for two way movement as needed
Data archival to Hadoop is also taken care automatically using this. This is also like DDO, fully integrated with HANA Vora. This also supports Sybase IQ data base.
All rules created in DLM are stored as Database Stored Procedures and thus run close to Database. For data displaced out from memory to extended table or tables in connected data storage system, DLM creates a HANA view, if one needs to use it for viewing all data together (Hot and Cold).
Image – DLM – Home Screen
So it can take care data temperature management between HANA and HANA itself (e.g. Extended table and together with DDO, it can allocate one host for all Extended table operations) or with Hadoop (close integration with HANA Vora) or Sybase IQ systems.
Some additional functionalities (KPIs) are also available to better manage data e.g. Forecast HANA Disc in 100 days vs utilization as of today and Memory in 100 days vs current utilization
The displaced data is accessed via SDA. for HADOOP SDA is utilized using Spark/Vora as well.
DLM is considered as complimentary administrative tool for SAP Vora
Images Credit – Various SAP documentations