Business data is often viewed as the critical resource of the 21th century. The more actual the business data is, the more valuable it is considered. However historic data is not utterly worthless. To offer the best possible, meaning the most performant, consistent and correct access to data given a fixed budget, we need to know: Who consumes which slice of our business data at what point in time? This blog is about how to find out valid answers to this question from the perspective of a BW administrator.
Access to the data is granted via SAP BWs analytic engine. SAP BW users access the data via a BExQuery. The analytic engine in turn requests the data from the persistency services. BW (on HANA) offers a multi-temperature data lifecycle concept, with data stored in-memory and columnar format, usage of the non-active data concept, in the HANA ExtendedStorage (aka Dynamic Tiering), usage of the Nearline-Storage options, or archiving and, of course, you can delete the data.
Now given our fixed budget, how should we find out how to distribute the data across the different storage layers?
SAP BW on HANA SP 8 comes equipped with the “Selection statistics”, a tool designed to track data access and then assist finding a proper data distribution. With the selection statistics you can record all data access requests of the Analytic Engine on your business data. The selection statistics can be enabled per Info Provider. If enabled, then for each data access request the minimal and maximal selection date on the time dimension, the name of the Info Provider, the name of the accessing user and the access time are stored.
One of the major use cases for the “Selection Statistics” is for the “Data aging” functionality in the Administrator workbench (Administrative Tools->Housekeeping->Data Aging) is to be able to propose time slices for shifting data to the NearlineStore. Technically the “Data Aging” tool assist in creating:
- Data Archving Processes
- Parametrization (variants) of Data Archiving Processes , containing the proposed time slice
- Process Chains that schedule the Data Archiving Processes
The recording of selection statistics is currently limited to time slices only. This limitation was introduced to
a) keep the amount of recorded data under control
b) minimize the impact on the query runtime due to the calculation of the data slices.
c) emphasize time filters, that are usually provided in all queries and are the most important criteria when it comes to data retention and lifecycle considerations.
If you would agree to this fine, otherwise feel free to post a comment and share your view.
Here some screenshots that demonstrate the use of the tools:
1.) Customizing the selection statistics (transaction spro)
2.) Analyzing the selection statistics
3.) Using selection statistics for Data Aging